Re: a BSD identd

1999-07-13 Thread Ian Dowse
In message , "Bria
n F. Feldman" writes:
>On 13 Jul 1999, Ville-Pertti Keinonen wrote:
>
>> 
>> gr...@freebsd.org (Brian F. Feldman) writes:
>> 
>> > It's "out with the bad, in with the good." Pidentd code is pretty terrible
>.
>> > The only security concerns with my code were wrt FAKEID, and those were
>> > mostly fixed (mostly meaning that a symlink _may_ be opened, but it won't
>> > be read.) If anyone wants to audit my code for security, I invite them to.
>> 
>> Did you mean to avoid reading through symlinks using the open + fstat
>> method mentioned earlier in the thread?
>
>No, I meant to avoid opening a file the user couldn't, or reading from a dev.

Why not actually store the fake ID in a symbolic link? That way you just
do a readlink(), which would be safer, neater and faster than reading a
file. A user can set up a fake ID with something like:

ln -s "Warm-Fuzzy" .fakeid

Ian


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: NFS problems due to getcwd/realpath

1999-07-15 Thread Ian Dowse
In message , Jan Conrad writes:
>after wondering for two years why FreeBSD (2.2.x ... 3.2) might lock up
>when an NFS server is down, I think I have found one reason for that (see
>kern/12609 - I now know it doesn't belong to kern - sorry).
>
>It is the implementation of getcwd (src/lib/libc/gen/getcwd.c). When
>examining the parent dir of a mounted filesystem, getcwd lstats every
>directory entry prior to the mountpoint to find out the name of the
>mountpoint (but it would only need the inodes's device to do a rough 
>check).

This should no longer be an issue with FreeBSD 3.x, as the system normally
uses the new _getcwd syscall. The old code is still in getcwd.c, but is
only used if the syscall isn't present (e.g. if running a 3.x executable
on a 2.2 system).

We use the following patch on all our 2.2-stable machines, which works
around the problem. This was submitted as PR bin/6658, but it wasn't
committed, as a backport of 3.x's _getcwd (which never occurred) was
considered to be a more appropriate change.

Ian

--- getcwd.c.orig   Tue Jun 30 15:38:44 1998
+++ getcwd.cTue Jun 30 15:39:08 1998
@@ -36,6 +36,7 @@
 #endif /* LIBC_SCCS and not lint */
 
 #include 
+#include 
 #include 
 
 #include 
@@ -169,7 +170,28 @@
if (dp->d_fileno == ino)
break;
}
-   } else
+   } else {
+   struct statfs sfs;
+   char *dirname;
+
+   /*
+* Try to get the directory name by using statfs on
+* the mount point. 
+*/
+   if (!statfs(up[3] ? up + 3 : ".", &sfs) &&
+   (dirname = rindex(sfs.f_mntonname, '/'))) 
+   while((dp = readdir(dir))) {
+   if (ISDOT(dp))
+   continue;
+   bcopy(dp->d_name, bup, dp->d_namlen+1);
+   if (!strcmp(dirname + 1, dp->d_name) &&
+   !lstat(up, &s) &&
+   s.st_dev == dev &&
+   s.st_ino == ino)
+   goto found;
+   }
+   rewinddir(dir);
+
for (;;) {
if (!(dp = readdir(dir)))
goto notfound;
@@ -187,7 +209,9 @@
if (s.st_dev == dev && s.st_ino == ino)
break;
}
+   }
 
+found:
/*
 * Check for length of the current name, preceding slash,
 * leading slash.


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Panic Kernel Dump to umass device?

2006-02-11 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Scott Long writes:
>You're correct that dumping is meant to be done with interrupts and task
>switching disabled.  The first thing that the umass driver is missing is
>a working CAM poll handler.  Without this, there is no way for command
>completions to be seen when interrupts are disabled.  Beyond that, I
>somewhat suspect that the USB stack expects to be able to push command
>completion work off to worker threads, at least for some situations, and
>that also will not work in the kernel dump environment.  So, there is a
>lot of work needed to make this happen.

The USB stack supports polled operations, so it's actually not to
hard to make this work. Below is a patch I had in one of my local
trees that adds a CAM poll handler to the umass driver. I've just
tested this and it does seem to make kernel dumping work, but I
guess it might not be as reliable as dumping to other devices.

Ian

Index: umass.c
===
RCS file: /dump/FreeBSD-CVS/src/sys/dev/usb/umass.c,v
retrieving revision 1.128
diff -u -r1.128 umass.c
--- umass.c 9 Jan 2006 01:33:53 -   1.128
+++ umass.c 11 Feb 2006 12:57:43 -
@@ -2627,21 +2627,17 @@
}
 }
 
-/* umass_cam_poll
- * all requests are handled through umass_cam_action, requests
- * are never pending. So, nothing to do here.
- */
 Static void
 umass_cam_poll(struct cam_sim *sim)
 {
-#ifdef USB_DEBUG
struct umass_softc *sc = (struct umass_softc *) sim->softc;
 
DPRINTF(UDMASS_SCSI, ("%s: CAM poll\n",
USBDEVNAME(sc->sc_dev)));
-#endif
 
-   /* nop */
+   usbd_set_polling(sc->sc_udev, 1);
+   usbd_dopoll(sc->iface);
+   usbd_set_polling(sc->sc_udev, 0);
 }
 
 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Panic Kernel Dump to umass device?

2006-02-12 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Nate Nielsen writes:
>Thanks, that helps. It works nicely with a uhci USB controller.
>
>However when the ohci driver is in use, we crash somewhere in
>usb_transfer_complete. I'll look into this further.

You could try updating to the latest 6-stable usb code, which might
possibly help the ohci case. There were a number of quite severe
ohci issues fixed since 6.0-release that might trigger more easily
when using polling. In particular, these revisions may be of interest:

ohci.c 1.154.2.1
ohcivar.h 1.40.2.1
usbdi.c 1.91.2.1

Ian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: contiguous memory allocation problem

2006-07-02 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Hans Petter Selasky writes:
>But there is one problem, that has been overlooked, and that is High speed 
>isochronous transfers, which are not supported by the existing USB system. I 
>don't think that the EHCI specification was designed for scatter and gather, 
>when you consider this:
>
>8 transfers of 0xC00 bytes has to fit on 7 pages. If this is going to work, 
>and I am right, one page has to contain two transfers. (see page 43 of 
>ehci-r10.pdf)

I haven't looked into the details, but the text in section 3.3.3
seems to suggest that EHCI is designed to not require physically
contiguous allocations here either, so the same approach of using
bus_dmamap_load() should work:

  This data structure requires the associated data buffer to be
  contiguous (relative to virtual memory), but allows the physical
  memory pages to be non-contiguous. Seven page pointers are provided
  to support the expression of 8 isochronous transfers. The seven
  pointers allow for 3 (transactions) * 1024 (maximum packet size)
  * 8 (transaction records) (24576 bytes) to be moved with this
  data structure, regardless of the alignment offset of the first
  page.

Ian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: contiguous memory allocation problem

2006-07-02 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Hans Petter Selasky writes:
>On Sunday 02 July 2006 14:05, Ian Dowse wrote:
>>   This data structure requires the associated data buffer to be
>>   contiguous (relative to virtual memory), but allows the physical
>>   memory pages to be non-contiguous. Seven page pointers are provided
>>   to support the expression of 8 isochronous transfers. The seven
>>   pointers allow for 3 (transactions) * 1024 (maximum packet size)
>>   * 8 (transaction records) (24576 bytes) to be moved with this
>>   data structure, regardless of the alignment offset of the first
>>   page.
>
>3 * 1024 bytes = 0xC00 bytes
>
>8 * 0xC00 = 0x6000 bytes maximum
>
>According to this you need "6" "EHCI pages", because "6 * 0x1000 = 0x6000". 
>The seventh "EHCI page" is just there to allow one to start at any page 
>offset. There is no eight "EHCI page".
>
>The only solution I see, is to have a double layer ITD. The first layer have 
>the 4 first transfers activated, and the second layer have the 4 last 
>transfers activated.
>
>A little more complicated, but not impossible.

The trick is that if the 0x6000 bytes are contiguous in virtual
memory then they never span more than 6 pages so one iTD is enough.

i.e. you can just do malloc(0x6000) and you don't need multi-page
physically contiguous buffers or extra memory-memory copies regardless
of how the virtual buffer maps to physical pages. This seems to be
the general extent of scatter-gather support offered by the various
USB host controllers (modulo various caveats such as assuming pages
are >= 4k, handling physical addresses > 4GB on non-IOMMU hardware
and UHCI's lack of support for mid-packet non-contiguous page
boundaries).

Ian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: contiguous memory allocation problem

2006-07-02 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Ian Dowse writes:
>The trick is that if the 0x6000 bytes are contiguous in virtual
>memory then they never span more than 6 pages so one iTD is enough.

Sorry, I meant of course 6 page boundaries, which means no more
than 7 pages. This is why the 7 physical address slots in the iTD
is always enough for 8 x 3k transaction records if the 24k buffer
is contiguous in virtual memory.

Ian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: contiguous memory allocation problem

2006-07-02 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Hans Petter Selasky writes:
>Ok. So the solution to my problem is to use scatter and gather. I will see 
>about updating my USB system to do it like that.
>
>But there is one thing I do not understand yet. When you load a page that 
>physically resides above 4GB, because a computer has more than 4GB of memory, 
>how does "bus_dmamap_load()" move that page down below 4GB, so that the 
>32-bit USB host controllers can reach it?

What should happen is that bus_dma allocates a bounce buffer and
performs copies as required from within the bus_dmamap_sync() calls.
This is something I haven't been able to verify yet with the USB
code though, so there could easily be bugs there.

BTW, as far as I know bus_dma is also missing support for multi-segment
allocations, so for example if you ask it to allocate 16k in at
most 4 segments below the 4GB mark, it will actually attempt a
physically contiguous allocation. If this was fixed it could be
used by usbd_alloc_buffer() to give directly usable buffers without
contiguous allocations.

Ian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Confusion in acpi_sleep_machdep().

2007-01-01 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Matthew Dillon w
rites:
>I'm trying to figure out how the acpi_sleep_machdep() code works and
>there are a couple of lines I just don't understand:
>
>pm = vmspace_pmap(p->p_vmspace);
>cr3 = rcr3();
>#ifdef PAE
>load_cr3(vtophys(pm->pm_pdpt));
>#else
>load_cr3(vtophys(pm->pm_pdir));
>#endif
>
>page = PHYS_TO_VM_PAGE(sc->acpi_wakephys);
>pmap_enter(pm, sc->acpi_wakephys, page,
>   VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE, 1);
>
>First, why isn't it just using kernel_pmap ?  What's all the load_cr3()
>stuff for ?
>
>Second, why is it entering the physical address sc->acpi_wakephys
>as the virtual address in the pmap ?  Shouldn't it be using
>sc->acpi_wakeaddr there?
>
>Anybody know ?

I don't know the details, but acpi_sleep_machdep() sets up an
identity mapping in the current process's vmspace (hence using
virtual = physical). Lazy switching of address spaces means that
cr3 may not currently refer to the same vmspace, which would break
the identity mapping, so that's the reason for the load_cr3() calls.
See revision 1.22 for a bit more information.

Ian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Mounting a CDROM in freeBSD 4.2

2001-01-17 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, "Daniel C. Sobral" writes:
>> and you must make sure your kernel is compiled with
>> options CD9660
>
>Err... no. The kld gets autoloaded if the kernel doesn't have cd9660
>compiled-in.

The error message that is printed is misleading though, and gives the
impression that cd9660 filesystem support is missing:

cd9660: No such file or directory

When mount(8) runs mount_cd9660, it gives it an argv[0] of the
fileystem type i.e. 'cd9660'. That's where the cd9660 in the error
message comes from. Maybe mount_cd9660 (and other mount_* programs)
should provide a bit more information in the error message?

Ian

Index: mount_cd9660.c
===
RCS file: /home/iedowse/CVS/src/sbin/mount_cd9660/mount_cd9660.c,v
retrieving revision 1.15
diff -u -r1.15 mount_cd9660.c
--- mount_cd9660.c  1999/10/09 11:54:08 1.15
+++ mount_cd9660.c  2001/01/17 12:34:23
@@ -176,7 +176,7 @@
errx(1, "cd9660 filesystem is not available");
 
if (mount(vfc.vfc_name, mntpath, mntflags, &args) < 0)
-   err(1, NULL);
+   err(1, "%s on %s: mount", mntpath, dev);
exit(0);
 }
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Have root on vinum, one small problem..

2001-02-02 Thread Ian Dowse

In message <4FC7AFEB1135D4119926F87A88260D9530@TRSBS>, Chris Williams write
s:
>
>Things seems to be working quite well, but there is one strange behavior which
> worries me; whenever I shut down, right after syncing I get a panic:
>
>panic: Vrele: negative ref cnt

I noticed this a while ago - it is due to inconsistent handling of
'rootvnode' in the kernel. You should find the details if you search
for 'rootvnode' in the -hackers archive.

The following patch should work around the panic by adding an extra
vnode reference for rootvp:

Ian

Index: init_main.c
===
RCS file: /FreeBSD/FreeBSD-CVS/src/sys/kern/init_main.c,v
retrieving revision 1.134.2.3
diff -u -r1.134.2.3 init_main.c
--- init_main.c 2000/09/07 19:13:36 1.134.2.3
+++ init_main.c 2001/02/02 16:01:52
@@ -456,6 +456,7 @@
VREF(fdp->fd_fd.fd_cdir);
VOP_UNLOCK(rootvnode, 0, &proc0);
fdp->fd_fd.fd_rdir = rootvnode;
+   VREF(fdp->fd_fd.fd_rdir);
 }
 SYSINIT(retrofit, SI_SUB_ROOT_FDTAB, SI_ORDER_FIRST, xxx_vfs_root_fdtab, NULL)



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: open (vfs_syscalls.c:994) && NFS

2001-04-25 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Oliver Cook writes:
>After about a week there are hundreds of stuck
>httpd processes in exactly this state. It is not
>possible to attach to them, but information can
>be gleaned from a kernel backtrace:

Could you post the full output of "ps axl" on one of these machines?
In this output, search for other odd process states, especially
"vmopar", and include a gdb backtrace from these processes too.

This sounds like a problem I described in


http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=243599+249172+/usr/local/www/db/text/2000/freebsd-hackers/20001022.freebsd-hackers

(split URL is
http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=243599+249172+
/usr/local/www/db/text/2000/freebsd-hackers/20001022.freebsd-hackers
in case the above doesn't work)

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: open (vfs_syscalls.c:994) && NFS

2001-04-25 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Oliver Cook writes:
>There are three processes stuck in vmopar. I include the backtrace
>of one of these below.

Thanks. That particular process is hanging because nfs_loadattrcache()
has noticed that the file shrunk, but it is not safe in this context
(from vm_fault) to do anything about it.

A workaround for this problem went into 4-stable at the end of last
October, so upgrading to a more recent -stable will stop these
hangs. As noted in the archived -hackers message I mentioned, there
is another related problem that still exists, but it seems to occur
much less frequently.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: open (vfs_syscalls.c:994) && NFS

2001-04-25 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Oliver Cook writes:
>However, the more noticeable problem was the processes stuck in
>nfsvin because of the broken directory entry. Have you any ideas
>as to what would be causing that particular problem which is
>plaguing our servers more than the vmopar problem?

The processes stuck in "nfsvinval" are just a side-effect of the
vmopar problem; they should go away too when you upgrade. I forget
the details, but I think the vmopar-hung process is holding some
lock so any other processes that try to access the same file hang
in nfsvinval. You can probably verify that every time there are
processes stuck in "nfsvinval" there is at least one process stuck
in "vmopar".

I haven't seen any evidence of the broken directory entries you
mention - maybe you're reading too far into the struct nameidata
fields in "nd". It may be normal for some fields to be uninitialised
or point at junk data.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: FreeBSD 4.2 ,kernel panic.

2001-05-14 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Andrea writes:
>MY FreeBSD 4.2 system has begun to crash  some time ago..

>fault virtual address   = 0x9ec03e00

This virtual address suggests that these crashes are caused by a
bug that was fixed around two months ago. See


http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=459199+462565+/usr/local/www/db/text/2001/freebsd-bugs/20010415.freebsd-bugs

for further details; updating to a more recent -stable will solve
this issue.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



UFS large directory performance

2001-06-01 Thread Ian Dowse


Prompted by the recent discussion about performance with large
directories, I had a go at writing some code to improve the situation
without requiring any filesystem changes. Large directories can
usually be avoided by design, but the performance hit is very
annoying when it occurs. The namei cache may help for lookups, but
each create, rename or delete operation always involves a linear
sweep of the directory.

The idea of this code is to maintain a throw-away in-core data
structure for large directories, allowing all operations to be
performed quickly without the need for a linear search. The
experimental (read 'may trash your system'!) proof-of-concept patch
is available at:

http://www.maths.tcd.ie/~iedowse/FreeBSD/dirhash.diff

The implementation uses a hash array that maps filenames to the
directory offset where the corresponding directory entry exists.
A simple spillover mechanism is used to deal with hash collisions,
and some extra summary information permits the quick location of
free space within the directory itself for create operations.

The in-core data structures have a memory requirement approximately
equal to half of the on-disk directory size. Currently there are
two sysctls that determine when directories get hashed:

 vfs.ufs.dirhashminsize Minimum directory on-disk size for which
hashing should be used (default 2.5k).
 vfs.ufs.dirhashmaxmem  Maximum system-wide amount of memory to
use for directory hashes (default 2Mb).

Even on a relatively slow machine (200Mhz P5), I'm seeing a file
creation speed that remains at around 1000 creations/second for
directories with more than 100,000 entries. Without this patch, I
get less than 20 creations per second on the same directory (in
both cases soft-updates is enabled).

To test, apply the patch, and add "options UFS_DIRHASH" to the
kernel config.

Currently there are a number of features missing, and there is a
lot of code for debugging and sanity checking that may affect
performance. The main issues I'm aware of are:
- There is no LRU mechanism for directory hash data structures. The
  hash tables get freed when the in-code inode is recycled, but no
  attempt is made to free existing memory when the dirhashmaxmem limit
  is reached.
- The lookup code does not optimise the case where successive
  offsets from the hash table are in the same filesystem block.

Ian 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: UFS large directory performance

2001-06-02 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Matt Dillon writes:

>What are your commit plans?  It looks extremely well contained,
>it could be committed to -current and then -stable a few days later 
>without any destabilizing impact at all for when the option isn't
>specified.
...
>The only potential problem I see here is that you could end up
>seriously fragmenting the malloc pool you are using to allocate the
>slot arrays.  And, of course, the two issues you brought up in
>regards to regularing memory use.

Thanks for the comments :-) Yes, malloc pool fragmentation is a
problem. I think that it can be addressed to some extent by using
a 2-level system (an array of pointers to fixed-size arrays) instead
of a single large array, but I'm open to any better suggestions.

If the second-level array size was fixed at around 4k, that would
keep the variable-length first-level array small enough not to
cause too many fragmentation issues. The per-DIRBLKSIZ free space
summary array is probably relatively okay as it is now.

The other main issue, that of discarding old hashes when the memory
limit is reached, may be quite tricky to resolve. Any approach
based on doing this from ufsdirhash_build() is likely to become a
locking nightmare. My original idea was to have ufsdirhash_build()
walk a list of other inodes with hashes attached and free them if
possible, but that would involve locking the other inode, so things
could get messy.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: UFS large directory performance

2001-06-03 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Matt Dillon writes:
>
>I would further recommend a (dynamic) array of pointers at the first
>level as part of the summary structure.  Any given array entry would
>either point to the second level array (the 512 byte allocations),
>be NULL (no second level array was necessary), or be (void *)-1 which
>would indicate that the second level array was reclaimed for other
>uses.

Nice idea, but I'm not sure I see the benefit of partially reclaiming
second-level arrays. Because it is a hash array, there isn't really
the concept of a working set; a directory that is `in use' will
rarely see many create/rename/delete operations on a small fixed
set of filenames. The lookup case is already cached elsewhere. I
think an all-or-nothing approach is likely to perform better and
be simpler to implement. Even the lazy allocation of second-level
arrays is unlikely to help a lot if the hash function does its job
well.

>
>If the zone allocator is used for the second level block allocations
>it shouldn't be a problem.  You can (had better be able to!) put a mutex
>around zone frees in -current.

The locking issues I could see were more in the area of finding
inodes to free hashes from. A linked list of dirhash structures
could be maintained (protected by a mutex), but to free the dirhash
belonging to an inode, the inode would probably need to be locked.
That means dereferencing dirhash->dh_inode->i_vnode and trying to
lock it, so things become complex.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: read(2) and ETIMEDOUT

2001-06-07 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Graham Barr writes:

>Also why does this happen only every few hours ? There is a lot of
>data going through these connections maybe the timer for SO_RCVTIMEO
>is not being reset.
>
>But then we have another server, with a similar number of clients and
>data through put, but it does not suffer from this problem.

I suspect that the server seeing this problem has a client that
occasionally disappears from the network, or for whatever reason
fails to respond to any packets for a long time (something like 5
or 10 minutes). I've seen blocking TCP writes return ETIMEDOUT when
the network between the client and the server goes down. In the
non-blocking case I think the following can happen:

1) Client is connected to server.
2) Network goes down, or client is turned off
3) Server performs non-blocking write() on socket
4) Server uses poll/select/kevent waiting for data from socket
5) The write operation times out because no acknowledgements
   have been received. This occurs after TCP_MAXRXTSHIFT
   retransmits, so->so_error is set to ETIMEDOUT and the
   connection is shut down (I haven't read the code very
   carefully, so the details could be wrong.
6) select/poll/kevent notes the EOF condition, and says that
   the descriptor is ready to read.
7) read() returns the real error, which is ETIMEDOUT.

I guess this should possibly be documented in read(2), but in
practice there are numerous network errors that can be returned
from read(). Normal practice in single-process servers is to
consider any unknown errors from read(),write() etc as only
fatal to that client rather than the whole server.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: UFS large directory performance

2001-06-08 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Terry Lambert writes:
>
>Use a chain allocator.  I would suggest using the zone
>allocator, but it has some fundamental problems that I
>don't think are really resolvable without a rewrite.

Heh, maybe, but I'm not sure I want to write a new allocator for
this :-) Based on Matt's suggestions, I implemented the 2-level
approach. It currently uses 256 slots per second-level block; these
1k blocks are allocated using zalloc(). The variable-length
first-level arrays are still allocated with malloc, but these don't
grow to more than a few kb in size unless the directories are
enormous.

There's now a simple LRU list of dirhash structures that have memory
attached, and a new function ufsdirhash_recycle() that will free
up memory when the sysctl limit is reached. Adding this required
some locking, but the problematic inode locking is avoided by
leaving the dirhash structure attached to the inode when its hash
array is freed.

An updated patch is available at

http://www.maths.tcd.ie/~iedowse/FreeBSD/dirhash.diff3

I haven't had a chance to do more than a minimal amount of testing,
so there may be many issues remaining.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Strange request: Reading RX-50 (aka DEC Rainbow 100) disks

2001-06-10 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Warner Losh writ
es:

>I do have the options of connection the hardware up to the floppy
>controller in my desktop too :-).  I have both the RX-50 drives, as
>well as a pair of TEAC FD55 drives (that do the same data rate as the
>RX-50's, with the same heads, but with only one drive per spindle and
>two read heads instead of one).  Trouble is, it looks like our floppy
>driver doesn't grok single sided 400k disks :-(.  That's what I'm
>looking to hack and advise on how to hack.

The fdcontrol program allows most of the paramaters to be set to
match the disks, but unfortunately it cannot set the sector offset.
MSDOS disks sectors are numbered starting at 1 (the sector offset
is 1), but it was common practice with old 8-bit CP/M-type systems
to choose sector numbers starting at 0x41, 0x81 or other values.

I was attempting something similar last summer, but with disks from
an Amstrad CPC computer. I used the following patch to the fd driver
and fdcontrol to allow the sector offset to be specified along with
the other parameters. It also allows a head offset to be specified,
which is useful for reading the second side of double-sided disks
that were written as single-sided disks with a hardware switch on
the side-select line (i.e the head number written to disk does not
match the hardware head number).

The patch below is against RELENG_4 around Jan 2000, so it will
need updating. I'm also not sure what sector offset the DEC Rainbow
used - I think I have a Rainbow boot disk here, but I'd have to
dig out a 5.25 floppy drive to check :-) Once you get the settings
right, you can just dd the disk to an image file.

Ian


Index: sys/i386/include/ioctl_fd.h
===
RCS file: /dump/FreeBSD-CVS/src/sys/i386/include/Attic/ioctl_fd.h,v
retrieving revision 1.13
diff -u -r1.13 ioctl_fd.h
--- sys/i386/include/ioctl_fd.h 1999/12/29 04:33:02 1.13
+++ sys/i386/include/ioctl_fd.h 2001/06/10 15:36:24
@@ -86,6 +86,7 @@
 struct fd_type {
int sectrac;/* sectors per track */
int secsize;/* size code for sectors */
+   int secoff; /* starting sector number*/
int datalen;/* data len when secsize = 0 */
int gap;/* gap len between sectors   */
int tracks; /* total num of tracks   */
@@ -95,6 +96,7 @@
int heads;  /* number of heads   */
int f_gap;  /* format gap len*/
int f_inter;/* format interleave factor  */
+   int headoff;
 };
 
 #define FD_FORM   _IOW('F', 61, struct fd_formb) /* format a track */
Index: sys/isa/fd.c
===
RCS file: /dump/FreeBSD-CVS/src/sys/isa/fd.c,v
retrieving revision 1.176
diff -u -r1.176 fd.c
--- sys/isa/fd.c2000/01/08 09:33:06 1.176
+++ sys/isa/fd.c2001/06/10 15:52:19
@@ -125,24 +125,24 @@
 
 static struct fd_type fd_types[NUMTYPES] =
 {
-{ 21,2,0xFF,0x04,82,3444,1,FDC_500KBPS,2,0x0C,2 }, /* 1.72M in HD 3.5in */
-{ 18,2,0xFF,0x1B,82,2952,1,FDC_500KBPS,2,0x6C,1 }, /* 1.48M in HD 3.5in */
-{ 18,2,0xFF,0x1B,80,2880,1,FDC_500KBPS,2,0x6C,1 }, /* 1.44M in HD 3.5in */
-{ 15,2,0xFF,0x1B,80,2400,1,FDC_500KBPS,2,0x54,1 }, /*  1.2M in HD 5.25/3.5 */
-{ 10,2,0xFF,0x10,82,1640,1,FDC_250KBPS,2,0x2E,1 }, /*  820K in HD 3.5in */
-{ 10,2,0xFF,0x10,80,1600,1,FDC_250KBPS,2,0x2E,1 }, /*  800K in HD 3.5in */
-{  9,2,0xFF,0x20,80,1440,1,FDC_250KBPS,2,0x50,1 }, /*  720K in HD 3.5in */
-{  9,2,0xFF,0x2A,40, 720,1,FDC_250KBPS,2,0x50,1 }, /*  360K in DD 5.25in */
-{  8,2,0xFF,0x2A,80,1280,1,FDC_250KBPS,2,0x50,1 }, /*  640K in DD 5.25in */
-{  8,3,0xFF,0x35,77,1232,1,FDC_500KBPS,2,0x74,1 }, /* 1.23M in HD 5.25in */
-
-{ 18,2,0xFF,0x02,82,2952,1,FDC_500KBPS,2,0x02,2 }, /* 1.48M in HD 5.25in */
-{ 18,2,0xFF,0x02,80,2880,1,FDC_500KBPS,2,0x02,2 }, /* 1.44M in HD 5.25in */
-{ 10,2,0xFF,0x10,82,1640,1,FDC_300KBPS,2,0x2E,1 }, /*  820K in HD 5.25in */
-{ 10,2,0xFF,0x10,80,1600,1,FDC_300KBPS,2,0x2E,1 }, /*  800K in HD 5.25in */
-{  9,2,0xFF,0x20,80,1440,1,FDC_300KBPS,2,0x50,1 }, /*  720K in HD 5.25in */
-{  9,2,0xFF,0x23,40, 720,2,FDC_300KBPS,2,0x50,1 }, /*  360K in HD 5.25in */
-{  8,2,0xFF,0x2A,80,1280,1,FDC_300KBPS,2,0x50,1 }, /*  640K in HD 5.25in */
+{ 21,2,1,0xFF,0x04,82,3444,1,FDC_500KBPS,2,0x0C,2 }, /* 1.72M in HD 3.5in */
+{ 18,2,1,0xFF,0x1B,82,2952,1,FDC_500KBPS,2,0x6C,1 }, /* 1.48M in HD 3.5in */
+{ 18,2,1,0xFF,0x1B,80,2880,1,FDC_500KBPS,2,0x6C,1 }, /* 1.44M in HD 3.5in */
+{ 15,2,1,0xFF,0x1B,80,2400,1,FDC_500KBPS,2,0x54,1 }, /*  1.2M in HD 5.25/3.5 */
+{ 10,2,1,0xFF,0x10,82,1640,1,FDC_250KBPS,2,0x2E,1 }, /*  820K in HD 3.5in */
+{ 10,2,1,0xFF,0x10,80,1600,1,FDC_250KBPS,2,0x2E,1 }, /*  800K in HD 3.5in */
+{  9,2,1,0xFF,0x20,80,1440,1,FDC_250KBPS,2,0x50,1 }, /* 

Re: Strange request: Reading RX-50 (aka DEC Rainbow 100) disks

2001-06-10 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Warner Losh writ
es:
>
>That's OK.  The Rainbow disks have sectors numbered 1 through 10, for
>both CP/M disks and MS-DOS disks.  This makes things easier to cope
>with.

Great, then no driver changes are required. I've just tried it; I
found a normal PC 5.25" drive, and I was able to read the DEC
Rainbow boot disk I have here by doing

# fdcontrol /dev/fd1
sectrac? []: 10
secsize? [2]:
datalen? [0xff]:
gap? [0x1b]:
tracks? [80]:
size? []: 800
steptrac? [1]:
trans? []: 1
heads? []: 1
f_gap? [0x54]:
f_inter? [1]:

# hd /dev/fd1 |less

Note: The `trans' values come from the 'FDC_???KBPS' #defines in
fdreg.h. A value of 1 is 'FDC_300KBPS' which is different to the
specs you quoted, but I think the PC standard 5.25" drive runs at
360rpm rather than 300. For a 300rpm drive you probably want a
trans value of 2 (250kbps).

I just left the `gap' and `f_gap' values at their defaults; I don't
know the exact details of these fields, but I seem to remember that
they are only used during writing and formatting, so you can ignore
them for reading.

>for this project.  Any thumbnail about how to add a new type of drive
>to fd.c?  What parameters do I need for it?

You could add an entry to the fd_types array in fd.c, but that
requires linking the entry into a device node, so it's probably
easier to just use fdcontrol.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: How to do proper locking

2005-08-06 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Hans Petter Selasky writes:
>Yes, you are right, but the problem is, that for most callback systems in the 
>kernel, there is no mechanism that will pre-lock some custom mutex before 
>calling the callback.
>
>I am not speaking about adding lines to existing code, but to add one extra 
>parameter to the setup functions, where the mutex that should be locked 
>before calling the callback(s) can be specified. If it is NULL, Giant will be 
>used.
>
>The setup functions I have in mind are for example:  "make_dev()", 
>"bus_setup_intr()", "callout_reset()" ... and in general all callback systems 
>that look like these.

Note that FreeBSD's callout subsystem does already have such a
mechanism. Just use callout_init_mtx() and the specified mutex will
be acquired before the callback is invoked. See callout(9) for more
details.

Ian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Low umass performance with USB 2.0 ports

2005-08-30 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, "Eygene A. Ryabinkin" wri
tes:
>> 
>> What is filesystem has your USB drive?
> The one I was extensively testing has FAT, but I've checked the UFS2 --
>just a bit better -- 1.8 Mb/second. But you're right -- no wdrains at all.
>> FreeBSD 4.x had very low performance with FAT filesystem,
>> writing process spent lots of time in the wdrain state too.
> Yes, it has. But here the same flash drive gives different results for
>ehci and uhci devices, and the total speed of echi is lower due to wdrains:
>300 Kb/sec versus 500 Kb/sec. And I sometimes write my data to the Windows
>partition with FAT to my home HDD -- it has no wdrains. At least, I've not
>noticed them. For flash I can.

The patch in from the email below may help with the wdrain state -
can you see if it makes any difference?

Ian



Date:Sun, 26 Jun 2005 17:42:44 BST
To:  Stefan Walter <[EMAIL PROTECTED]>
cc:  freebsd-stable@freebsd.org
From:Ian Dowse <[EMAIL PROTECTED]>
Subject: Re: EHCI: mtools stuck in state 'physrd' or panic 

OpenBSD have a workaround for problems with VIA EHCI controllers
that can cause the hanging symptoms you describe. Below is a patch
that implements their change in FreeBSD's driver. Could you try it
to see if it helps?

Thanks,

Ian

Index: sys/dev/usb/ehci.c
===
RCS file: /dump/FreeBSD-CVS/src/sys/dev/usb/ehci.c,v
retrieving revision 1.14.2.9
diff -u -r1.14.2.9 ehci.c
--- sys/dev/usb/ehci.c  31 Mar 2005 19:47:11 -  1.14.2.9
+++ sys/dev/usb/ehci.c  26 Jun 2005 16:21:11 -
@@ -155,6 +155,7 @@
 Static voidehci_idone(struct ehci_xfer *);
 Static voidehci_timeout(void *);
 Static voidehci_timeout_task(void *);
+Static voidehci_intrlist_timeout(void *);
 
 Static usbd_status ehci_allocm(struct usbd_bus *, usb_dma_t *, u_int32_t);
 Static voidehci_freem(struct usbd_bus *, usb_dma_t *);
@@ -491,6 +492,7 @@
EOWRITE4(sc, EHCI_ASYNCLISTADDR, sqh->physaddr | EHCI_LINK_QH);
 
usb_callout_init(sc->sc_tmo_pcd);
+   usb_callout_init(sc->sc_tmo_intrlist);
 
lockinit(&sc->sc_doorbell_lock, PZERO, "ehcidb", 0, 0);
 
@@ -694,6 +696,11 @@
ehci_check_intr(sc, ex);
}
 
+   /* Schedule a callout to catch any dropped transactions. */
+   if ((sc->sc_flags & EHCI_SCFLG_LOSTINTRBUG) &&
+   !LIST_EMPTY(&sc->sc_intrhead))
+   usb_callout(sc->sc_tmo_intrlist, hz, ehci_intrlist_timeout, sc);
+
 #ifdef USB_USE_SOFTINTR
if (sc->sc_softwake) {
sc->sc_softwake = 0;
@@ -942,6 +949,7 @@
EOWRITE4(sc, EHCI_USBINTR, sc->sc_eintrs);
EOWRITE4(sc, EHCI_USBCMD, 0);
EOWRITE4(sc, EHCI_USBCMD, EHCI_CMD_HCRESET);
+   usb_uncallout(sc->sc_tmo_intrlist, ehci_intrlist_timeout, sc);
usb_uncallout(sc->sc_tmo_pcd, ehci_pcd_enable, sc);
 
 #if defined(__NetBSD__) || defined(__OpenBSD__)
@@ -2701,6 +2708,30 @@
splx(s);
 }
 
+
+/*
+ * Some EHCI chips from VIA seem to trigger interrupts before writing back the
+ * qTD status, or miss signalling occasionally under heavy load.  If the host
+ * machine is too fast, we we can miss transaction completion - when we scan
+ * the active list the transaction still seems to be active.  This generally
+ * exhibits itself as a umass stall that never recovers.
+ *
+ * We work around this behaviour by setting up this callback after any softintr
+ * that completes with transactions still pending, giving us another chance to
+ * check for completion after the writeback has taken place.
+ */
+void
+ehci_intrlist_timeout(void *arg)
+{
+   ehci_softc_t *sc = arg;
+   int s = splusb();
+
+   DPRINTFN(3, ("ehci_intrlist_timeout\n"));
+   usb_schedsoftintr(&sc->sc_bus);
+
+   splx(s);
+}
+
 //
 
 Static usbd_status
Index: sys/dev/usb/ehci_pci.c
===
RCS file: /dump/FreeBSD-CVS/src/sys/dev/usb/ehci_pci.c,v
retrieving revision 1.14.2.2
diff -u -r1.14.2.2 ehci_pci.c
--- sys/dev/usb/ehci_pci.c  13 Jun 2005 09:00:19 -  1.14.2.2
+++ sys/dev/usb/ehci_pci.c  26 Jun 2005 16:21:11 -
@@ -303,6 +303,10 @@
return ENXIO;
}
 
+   /* Enable workaround for dropped interrupts as required */
+   if (pci_get_vendor(self) == PCI_EHCI_VENDORID_VIA)
+   sc->sc_flags |= EHCI_SCFLG_LOSTINTRBUG;
+
/*
 * Find companion controllers.  According to the spec they always
 * have lower function numbers so they should be enumerated already.
Index: sys/dev/usb/ehcivar.h
===
RCS file: /dump/FreeBSD-CVS/src/sys/dev/u

Re: NFS/VM deadlock report and help request

2000-10-20 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Vadim Belman writes:

>wmesg=0xc0233171 "vmopar", timo=0) at ../../kern/kern_synch.c:467
...
>#8  0xc01dd606 in vm_fault (map=0xdc3e7e80, vaddr=712876032, 
>fault_type=1 '\001', fault_flags=0) at ../../vm/vm_pager.h:130


If anyone is interested, here are a few further details from my
mailbox. The patch David included appears to have solved this
particular problem for us, but there is another similar problem
lurking within the NFS/VM system.

Ian


The problem seems to originate with NFS's postop_attr information
that is returned with a read or write RPC. Within a vm_fault context,
the code cannot deal with vnode_pager_setsize() shrinking a vnode.

The workaround in the patch below stops the nfsm_postop_attr() macro
from ever shrinking a vnode. If the new size in the postop_attr
information is smaller, then it just sets the nfsnode n_attrstamp to 0
to stop the wrong size getting used in the future. This change only
affects postop_attr attributes; the nfsm_loadattr() macro works as
normal.

The change is implemented by adding a new argument to nfs_loadattrcache()
called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never
reduce the vnode/nfsnode size; instead it zeros n_attrstamp.

---

Hmm. We used this patch for a while - it stopped those particular vmopar
hangs, but another kind of deadlock has emerged (which happens with or
without the patch).

It seems that vinvalbuf() locks the vnode's v_interlock before calling
vm_object_page_remove(). vm_object_page_remove will then lock a page i.e.

 vinvalbuf() [Lock v_interlock] ->
 vm_object_page_remove() [Lock page]

If another process concurrently vm_fault's on the same vnode then it
locks the page, and finishes with a vput(vp). vput() locks the
interlock, so it results in:
 
 vm_fault() [Lock page] ->
 vput() [Lock v_interlock]

This is a simple lock-ordering deadlock. Since vm_fault can keep the
page locked for a considerable amount of time with NFS, this deadlock
can happen quite easily. I'm not sure what to suggest as a solution,
but keeping the v_interlock locked across a tsleep seems wrong... Any
ideas? Traces below.


#12 0xc02140f0 in atkbd_isa_intr (unit=0) at ../../i386/isa/atkbd_isa.c:84
#13 0xc020eceb in wait ()
#14 0xc01e22d3 in _unlock_things (fs=0xca6f0ef0, dealloc=0)
at ../../vm/vm_fault.c:148
#15 0xc01e2b73 in vm_fault (map=0xca6d2ac0, vaddr=134766592,
fault_type=1 '\001', fault_flags=0) at ../../vm/vm_fault.c:745
#16 0xc0210252 in trap_pfault (frame=0xca6f0fbc, usermode=1, eva=134769544)
at ../../i386/i386/trap.c:816
#17 0xc020fda2 in trap (frame={tf_es = 39, tf_ds = 39, tf_edi = -1077946880,
  tf_esi = 1, tf_ebp = -1077947052, tf_isp = -898691100,
  tf_ebx = -1077946872, tf_edx = 4, tf_ecx = -1077947772, tf_eax = 2,
  tf_trapno = 12, tf_err = 4, tf_eip = 134769544, tf_cs = 31,
  tf_eflags = 66050, tf_esp = -1077947172, tf_ss = 39})
at ../../i386/i386/trap.c:358
#18 0x8086b88 in ?? ()

(kgdb) proc 1042
(kgdb) bt
#0  mi_switch () at ../../kern/kern_synch.c:825
#1  0xc0150b4d in tsleep (ident=0xc0598534, priority=4,
wmesg=0xc024d22a "vmopar", timo=0) at ../../kern/kern_synch.c:443
#2  0xc01eaec6 in vm_page_sleep (m=0xc0598534, msg=0xc024d22a "vmopar",
busy=0xc0598563 "") at ../../vm/vm_page.c:1052
#3  0xc01e9aff in vm_object_page_remove (object=0xca6bac1c, start=0, end=0,
clean_only=1) at ../../vm/vm_object.c:1335
#4  0xc0172a6a in vinvalbuf (vp=0xca6bf700, flags=1, cred=0xc171ec80,
p=0xca6e5a40, slpflag=256, slptimeo=0) at ../../kern/vfs_subr.c:671
#5  0xc019541c in nfs_vinvalbuf (vp=0xca6bf700, flags=1, cred=0xc171ec80,
p=0xca6e5a40, intrflg=1) at ../../nfs/nfs_bio.c:978
#6  0xc01b6859 in nfs_open (ap=0xca6f3e2c) at ../../nfs/nfs_vnops.c:490
#7  0xc01796ae in vn_open (ndp=0xca6f3f00, fmode=1, cmode=1512)
at vnode_if.h:163
#8  0xc01760d9 in open (p=0xca6e5a40, uap=0xca6f3f94)
at ../../kern/vfs_syscalls.c:935
#9  0xc02108bf in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 134725618,
  tf_esi = -1077946896, tf_ebp = -1077946944, tf_isp = -898678812,
  tf_ebx = -1077946956, tf_edx = -1077946588, tf_ecx = 134893176,
  tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 672042756, tf_cs = 31,
  tf_eflags = 514, tf_esp = -1077949296, tf_ss = 39})
at ../../i386/i386/trap.c:1100
#10 0xc01ff11c in Xint0x80_syscall ()
#11 0x8049d39 in ?? ()

-


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: dhcp boot was: Re: diskless workstation

2000-11-05 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Doug Ambrisko writes:
>| to the kernel's output. I had a look at the pxe code in
>| /sys/boot/i386/libi386/pxe.c where pxeboot is built from and in
>| /sys/i386/i386/autoconf.c which is the kernel side and it looks like
>| they don't do anything about swap. There is a /* XXX set up swap? */
>| placeholder though. :-)
>
>Yep looks like you're right, I just tried it on 4.2-BETA it worked in 
>4.1.1.  Swap is now broken ... sigh this is going to be a problem.  I 
>guess the only thing you might be able to do in the interim is to do a 
>vnconfig of a file and then mount that as swap.  I think the vnconfig 
>man pages describes this.  Hopefully it works over NFS.

The diskless setup we use here is based on a compiled-in MFS root
rather than an NFS root, so we couldn't use the bootp code to enable
NFS swap.  Our solution was a modification to swapon() to enable
direct swapping to NFS regular files.

This results in the same swaponvp() call that the bootp code would
use (at the time we implemented this, swapping over NFS via vnconfig
was extremely unreliable; I think things are much better now).

The patch we use is below.

Ian

Index: vm_swap.c
===
RCS file: /FreeBSD/FreeBSD-CVS/src/sys/vm/vm_swap.c,v
retrieving revision 1.96
diff -u -r1.96 vm_swap.c
--- vm_swap.c   2000/01/25 17:49:12 1.96
+++ vm_swap.c   2000/11/05 11:04:34
@@ -202,10 +202,14 @@
NDFREE(&nd, NDF_ONLY_PNBUF);
vp = nd.ni_vp;
 
-   vn_isdisk(vp, &error);
-
-   if (!error)
+   if (vn_isdisk(vp, &error))
error = swaponvp(p, vp, vp->v_rdev, 0);
+   else if (vp->v_type == VREG && vp->v_tag == VT_NFS) {
+   struct vattr attr;
+   error = VOP_GETATTR(vp, &attr, p->p_ucred, p);
+   if (!error)
+   error = swaponvp(p, vp, NODEV, attr.va_size/DEV_BSIZE);
+   }
 
if (error)
vrele(vp);


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: post-install of kernal sources, maxusers max?

2000-11-08 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Len Conrad 
writes:

># vmstat -z
...
>socket  607 1050 113/196K
...
>kern.ipc.maxsockets: 1064

>doesn't look like it to me.

I think a few slots are reserved, so you can consider 1050 as being
equal to 1064. Try putting

set kern.ipc.maxsockets=4000

in /boot/loader.rc and rebooting.

Ian


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



rootvnode

2000-11-30 Thread Ian Dowse


It appears that the pointer to the root vnode, 'rootvnode' does
not hold a corresponding vnode reference. Here's a fragment of code
from start_init():

/* Get the vnode for '/'.  Set p->p_fd->fd_cdir to reference it. */
if (VFS_ROOT(TAILQ_FIRST(&mountlist), &rootvnode))
panic("cannot find root vnode");
p->p_fd->fd_cdir = rootvnode;
VREF(p->p_fd->fd_cdir);
p->p_fd->fd_rdir = rootvnode;
VOP_UNLOCK(rootvnode, 0, p);

Since rootvnode is a global variable, three pointers to the root
vnode are stored, but only two references are counted (one by
VFS_ROOT, one by VREF).

Normally this is not a problem, since proc0's fd_cdir and fd_rdir
keep their references until the system is rebooted. However the
code in vfs_syscalls.c's checkdirs() function assumes that rootvnode
does hold a reference on the vnode:

if (rootvnode == olddp) {
vrele(rootvnode);
VREF(newdp);
rootvnode = newdp;
}

This bug reliably causes a panic on reboot if any filesystem has
been mounted directly over /. For example, try:

mount_mfs -T fd1440 none /
Ctrl-Alt-Delete

On -current the panic is 'vrele: missed vn_close'; on 4.1-STABLE it
is 'vrele: negative ref cnt'. It occurs in dounmount() at the lines

if ((coveredvp = mp->mnt_vnodecovered) != NULLVP) {
coveredvp->v_mountedhere = (struct mount *)0;
vrele(coveredvp);
}

when unmounting the second / filesystem. This occurs because
checkdirs() has stolen a reference to /, so the reference count
goes negative when we attempt to remove the last reference.

This brings up another question: should the code reverse the changes
made by checkdirs() when a filesystem is unmounted? It certainly
seems to make sense to make rootvnode point to underlying vnode
when the filesystem containing the current rootvnode is unmounted;
I'm not sure how useful fixing up other fd_cdir/fd_rdir pointers
would be.

I can produce a simple patch which does the following:
 -  vref(rootvnode) in start_init().
 -  vrele(rootvnode) if non-NULL, maybe in vfs_unmountall()
 -  point rootvnode at underlying vnode when the filesystem
containing rootvnode is unmounted.

Does this sound reasonable?

Ian


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: fsck problem on large vinum volume

2001-01-07 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Jaye Mathisen writes:
>
>I have a 930GB vinum volume

>However, I can't fsck it, I have to always use the alternate block.

>newsfeed-inn2# fsck /dev/vinum/v-spool
>** /dev/vinum/v-spool
>BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE
>/dev/vinum/v-spool: CANNOT FIGURE OUT FILE SYSTEM PARTITION

Jaye sent me a ktrace.out for the fsck that was failing. It appears
that the kernel had overshot the end of the superblock fs_csp[] array
in ffs_mountfs(), since the list of pointers there extended through
fs_maxcluster, fs_cpc, and fs_opostbl. This caused the mismatch between
the master and alternate superblocks.

The filesystem parameters were 8k/1k, and the total number of cylinder
groups was 29782. fs_cssize was 29782*sizeof(struct csum) = 477184
bytes. Hence 477184/8192 = ~59 entries were being used in fs_csp,
but fs_csp[] is only 31 entries long (15 on alpha).

A larger block size should fix Jaye's case, but I think the correct
solution is to fix the kernel so that it is not constrained by the
MAXCSBUFS limit. There are a few ways to do this:

- Store the fs_csp information in struct ufsmount rather than
  in the superblock.
- Make use of the fact that the summary information is stored
  in one contigous region, and update the 'fs_csp' macro to
  find the right offset directly.

I'll have a look and see which way looks neatest.

Ian


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: fsck problem on large vinum volume

2001-01-07 Thread Ian Dowse


[moved to -fs]

In message <[EMAIL PROTECTED]>, Ian Dowse writes:
>
>Jaye sent me a ktrace.out for the fsck that was failing. It appears
>that the kernel had overshot the end of the superblock fs_csp[] array
>in ffs_mountfs(), since the list of pointers there extended through
>fs_maxcluster, fs_cpc, and fs_opostbl. This caused the mismatch between
>the master and alternate superblocks.
>
>The filesystem parameters were 8k/1k, and the total number of cylinder
>groups was 29782. fs_cssize was 29782*sizeof(struct csum) = 477184
>bytes. Hence 477184/8192 = ~59 entries were being used in fs_csp,
>but fs_csp[] is only 31 entries long (15 on alpha).

Here is a patch which should avoid the possibility of overflowing
the fs_csp[] array. The idea is that since all summary blocks are
stored in one contiguous malloc'd region, there is no need to
have a separate pointer to the start of each block within that
region.

This is achieved by simplifying the 'fs_cs' macro from

fs_csp[(indx) >> (fs)->fs_csshift][(indx) & ~(fs)->fs_csmask]
to
fs_csp[0][indx]

so that only the start of the malloc'd region is needed, and can always
be placed in fs_csp[0] without the risk of overflow.

I have only tested this to the extent that the kernel compiles and
runs, and only on -stable.

Any comments or suggestions?

Ian

Index: ffs/ffs_vfsops.c
===
RCS file: /home/iedowse/CVS/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.134
diff -u -r1.134 ffs_vfsops.c
--- ffs/ffs_vfsops.c2000/12/13 10:03:52 1.134
+++ ffs/ffs_vfsops.c2001/01/07 19:04:06
@@ -365,7 +365,7 @@
 {
register struct vnode *vp, *nvp, *devvp;
struct inode *ip;
-   struct csum *space;
+   caddr_t space;
struct buf *bp;
struct fs *fs, *newfs;
struct partinfo dpart;
@@ -432,7 +432,7 @@
 * Step 3: re-read summary information from disk.
 */
blks = howmany(fs->fs_cssize, fs->fs_fsize);
-   space = fs->fs_csp[0];
+   space = (caddr_t)fs->fs_csp[0];
for (i = 0; i < blks; i += fs->fs_frag) {
size = fs->fs_bsize;
if (i + fs->fs_frag > blks)
@@ -441,7 +441,8 @@
NOCRED, &bp);
if (error)
return (error);
-   bcopy(bp->b_data, fs->fs_csp[fragstoblks(fs, i)], (u_int)size);
+   bcopy(bp->b_data, space, (u_int)size);
+   space += size;
brelse(bp);
}
/*
@@ -513,7 +514,7 @@
register struct fs *fs;
dev_t dev;
struct partinfo dpart;
-   caddr_t base, space;
+   caddr_t space;
int error, i, blks, size, ronly;
int32_t *lp;
struct ucred *cred;
@@ -623,18 +624,18 @@
blks = howmany(size, fs->fs_fsize);
if (fs->fs_contigsumsize > 0)
size += fs->fs_ncg * sizeof(int32_t);
-   base = space = malloc((u_long)size, M_UFSMNT, M_WAITOK);
+   space = malloc((u_long)size, M_UFSMNT, M_WAITOK);
+   fs->fs_csp[0] = (struct csum *)space;
for (i = 0; i < blks; i += fs->fs_frag) {
size = fs->fs_bsize;
if (i + fs->fs_frag > blks)
size = (blks - i) * fs->fs_fsize;
if ((error = bread(devvp, fsbtodb(fs, fs->fs_csaddr + i), size,
cred, &bp)) != 0) {
-   free(base, M_UFSMNT);
+   free(fs->fs_csp[0], M_UFSMNT);
goto out;
}
bcopy(bp->b_data, space, (u_int)size);
-   fs->fs_csp[fragstoblks(fs, i)] = (struct csum *)space;
space += size;
brelse(bp);
bp = NULL;
@@ -691,7 +692,7 @@
if (ronly == 0) {
if ((fs->fs_flags & FS_DOSOFTDEP) &&
(error = softdep_mount(devvp, mp, fs, cred)) != 0) {
-   free(base, M_UFSMNT);
+   free(fs->fs_csp[0], M_UFSMNT);
goto out;
}
if (fs->fs_snapinum[0] != 0)
Index: ffs/fs.h
===
RCS file: /home/iedowse/CVS/src/sys/ufs/ffs/fs.h,v
retrieving revision 1.16
diff -u -r1.16 fs.h
--- ffs/fs.h2000/07/04 04:55:48 1.16
+++ ffs/fs.h2001/01/07 18:55:44
@@ -108,10 +108,10 @@
 /*
  * The limit on the amount of summary information per file system
  * is defined by MAXCSBUFS. It is currently parameterized for a
- * size of 128 bytes (2 million cylinder groups on machines with
- * 32-bit pointers, and 1 million on 64-bit machines). One pointer
- * is taken away to point to an array of cluster sizes that is
- * computed as cylinder group

Re: Swapping in diskless ? (was :Re: [hackers] Re: getting rid of sysinstall)

2001-07-13 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, David Gilbert write
s:
>Is it not possible (or has nobody done it) to swap with the current
>diskless boot?

I do remember some problem with PXE and swap, but I forget the
details or if it was resolved. The diskless setup that we have
locally uses an MFS root image in the kernel instead of an NFS
root, which meant that we couldn't use DHCP tags to configure swap.
Our solution was a small patch that allows swapon(8) to configure
direct swapping to NFS regular files. This does the same thing as
the DHCP swap tags, but is much more controllable - the rc scripts
can do something like:

swap=/swap/swapfile
rm -f $swap
truncate -s 30M $swap
swapon $swap

The patch (against RELENG_4) is below; I wonder should this just
be committed? We have certainly found it quite useful.

Ian

Index: vm_swap.c
===
RCS file: /FreeBSD/FreeBSD-CVS/src/sys/vm/vm_swap.c,v
retrieving revision 1.96.2.1
diff -u -r1.96.2.1 vm_swap.c
--- vm_swap.c   2000/10/13 07:13:23 1.96.2.1
+++ vm_swap.c   2001/07/13 23:12:10
@@ -202,10 +202,14 @@
NDFREE(&nd, NDF_ONLY_PNBUF);
vp = nd.ni_vp;
 
-   vn_isdisk(vp, &error);
-
-   if (!error)
+   if (vn_isdisk(vp, &error))
error = swaponvp(p, vp, vp->v_rdev, 0);
+   else if (vp->v_type == VREG && vp->v_tag == VT_NFS) {
+   struct vattr attr;
+   error = VOP_GETATTR(vp, &attr, p->p_ucred, p);
+   if (!error)
+   error = swaponvp(p, vp, NODEV, attr.va_size/DEV_BSIZE);
+   }
 
if (error)
vrele(vp);


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Default retry behaviour for mount_nfs

2001-07-19 Thread Ian Dowse


Shortly after the TI-RPC changes in -current, the default retry
behaviour for mount_nfs was changed. Previously, mount_nfs would
keep retrying for a long time (~1 week) if the server didn't respond,
but since revision 1.40 of mount_nfs.c, it gives up on non-background
mounts after one attempt.

I didn't back out this change in default behaviour in my later
commits to this file, since it seemed like a more reasonable default;
NFS filesystems listed in fstab listed without any options can no
longer hang the boot process waiting for the server to respond,
and background mounts will succeed whenever the server comes up.
I subsequently MFC'd this about 3 weeks ago.

What I just remembered the other day is that there are a class of
situations where you do want certain NFS mounts to hang the boot
process if the server is down. These include cases where an NFS
filesystem is critical to the boot process, so the machine will
get stuck if it tries to proceed without it. The changes to mount_nfs
had broken support for that situation, but I committed a fix to
-current today that allows you to add `-R0' to the mount options
to force mount_nfs to retry forever.

So the question is - should I keep the new behaviour that is probably
a better default and will catch out fewer new users but may surprise
some experienced users, or should I revert to the traditional
default where `-R1' or `-b' are required to avoid boot-time hangs?

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Default retry behaviour for mount_nfs

2001-07-20 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Terry Lambert writes:
>> FWIW, I vote that we rever to the traditional default and require
>> -R1 or -b to avoid boot time hangs. The standard behaviour for most
>> NFS implementations that I'm aware of would do this.
>
>I agree; people at work have bitched about this.  We have a
>FreeBSD NFS server that's flakey.

Ok, from the small set of responses so far, it seems that the most
acceptable option is to change mount_nfs to behave in the old way
where it will retry forever by default even in foreground mode.
Below is a proposed patch that does this. It also adds two paragraphs
near the start of the manpage which describe the default behaviour
and point readers at the relevant options. Comments welcome.

>The other thing is that it appears to break amd behaviour.

Does amd use mount_nfs(8)? I thought it did the mount syscalls
directly.

Ian


Index: mount_nfs.8
===
RCS file: /dump/FreeBSD-CVS/src/sbin/mount_nfs/mount_nfs.8,v
retrieving revision 1.27
diff -u -r1.27 mount_nfs.8
--- mount_nfs.8 2001/07/19 21:11:48 1.27
+++ mount_nfs.8 2001/07/20 22:20:35
@@ -71,6 +71,28 @@
 .%T "NFS: Network File System Version 3 Protocol Specification" ,
 Appendix I.
 .Pp
+By default,
+.Nm
+keeps retrying until the mount eventually succeeds.
+This behaviour is intended for filesystems listed in
+.Xr fstab 5
+that are critical to the boot process.
+For non-critical filesystems, the
+.Fl R
+and
+.Fl b
+flags provide mechanisms to prevent the boot process from hanging
+if the server is unavailable.
+.Pp
+If the server becomes unresponsive while an NFS filesystem is
+mounted, any new or outstanding file operations on that filesystem
+will hang uninterruptibly until the server comes back.
+To modify this default behaviour, see the
+.Fl i
+and
+.Fl s
+flags.
+.Pp
 The options are:
 .Bl -tag -width indent
 .It Fl 2
@@ -126,12 +148,8 @@
 help, but for normal desktop clients this does not apply.)
 .It Fl R
 Set the mount retry count to the specified value.
-A retry count of zero means to keep retrying forever.
-By default,
-.Nm
-retries forever on background mounts (see the
-.Fl b
-option), and otherwise tries just once.
+The default is a retry count of zero, which means to keep retrying
+forever.
 There is a 60 second delay between each attempt.
 .It Fl T
 Use TCP transport instead of UDP.
Index: mount_nfs.c
===
RCS file: /dump/FreeBSD-CVS/src/sbin/mount_nfs/mount_nfs.c,v
retrieving revision 1.45
diff -u -r1.45 mount_nfs.c
--- mount_nfs.c 2001/07/19 21:11:48 1.45
+++ mount_nfs.c 2001/07/20 21:37:19
@@ -486,7 +486,8 @@
name = *argv;
 
if (retrycnt == -1)
-   retrycnt = (opflags & BGRND) ? 0 : 1;
+   /* The default is to keep retrying forever. */
+   retrycnt = 0;
if (!getnfsargs(spec, nfsargsp))
exit(1);
 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



fdisk(8) adjusting to head/cylinder bounderies

2001-07-21 Thread Ian Dowse


For about a year, fdisk(8) has had code that automatically adjusts
partitions to begin on a head boundary and end on a cylinder
boundary. This is fine in most situations, but the way it is
implemented makes it awkward to override, and more importantly it
is way too easy to mess up an existing partition that is not properly
aligned on a head/cylinder boundary.

Currently, fdisk never asks the user for confirmation of changes
to the partition start and size. It just prints out a message
such as

WARNING: adjusting start offset of partition
to 12345 to fall on a head boundary

and then it immediately goes on to print out the full slice details,
so that warning is easily missed. It is possible to avoid the
automatic adjustment by answering "y" to the

Explicitly specify beg/end address?

question that refers to setting the c/h/s parameters, but if you
do that, then you can't make use of the automatic c/h/s calculation.

This problem bites me almost every time I use fdisk, since we have
a lot of disks that have been split into multiple partitions to
get around the 7 partitions/slice limit. I have always just changed
the slice to end exactly where the last partition ends, so having
fdisk rounding that down by a few sectors is not desirable. These
disks are generally SCSI, contain only FreeBSD partitions, and the
BIOSes we work with have never had problems with partitions that
are not head/cylinder aligned.

Below is a patch that makes fdisk request user confirmation before
making any changes to the start and end of partitions. It also
untangles the automatic c/h/s calculation from the start/size
adjustment, and doesn't set the partition type to 0 if the
adjustment fails.

I haven't put a great deal of thought into the specifics of the
patch, so any comments or suggestions are welcome. I just want to
avoid the behaviour where carefully calculated partition parameters
supplied by the user get changed automatically with only an easily-
missed warning printed.

Ian

Index: fdisk.c
===
RCS file: /dump/FreeBSD-CVS/src/sbin/i386/fdisk/fdisk.c,v
retrieving revision 1.50
diff -u -r1.50 fdisk.c
--- fdisk.c 2001/07/13 16:48:56 1.50
+++ fdisk.c 2001/07/21 12:02:01
@@ -548,6 +548,7 @@
Decimal("sysid (165=FreeBSD)", partp->dp_typ, tmp);
Decimal("start", partp->dp_start, tmp);
Decimal("size", partp->dp_size, tmp);
+   sanitize_partition(partp);
 
if (ok("Explicitly specify beg/end address ?"))
{
@@ -572,8 +573,6 @@
partp->dp_esect = DOSSECT(tsec,tcyl);
partp->dp_ehd = thd;
} else {
-   if (!sanitize_partition(partp))
-   partp->dp_typ = 0;
dos(partp->dp_start, partp->dp_size,
&partp->dp_scyl, &partp->dp_ssect, &partp->dp_shd);
dos(partp->dp_start + partp->dp_size - 1, partp->dp_size,
@@ -1398,6 +1397,17 @@
 
 max_end = partp->dp_start + partp->dp_size;
 
+if (partp->dp_start % dos_sectors != 0 ||
+   (partp->dp_start + partp->dp_size) % dos_sectors != 0) {
+   if (partp->dp_start % dos_sectors != 0)
+   warnx("WARNING: partition does not begin on a head boundary");
+   if ((partp->dp_start + partp->dp_size) % dos_sectors != 0)
+   warnx("WARNING: partition does not end on a cylinder boundary");
+   warnx("WARNING: this may confuse the BIOS or other operating systems");
+   if (!ok("Correct this automatically?"))
+   return(1);
+}
+
 /*
  * Adjust start upwards, if necessary, to fall on an head boundary.
  */
@@ -1412,9 +1422,7 @@
 "ERROR: unable to adjust start of partition to fall on a head boundary");
return (0);
 }
-   warnx(
-"WARNING: adjusting start offset of partition\n\
-to %u to fall on a head boundary",
+   warnx("WARNING: adjusting start offset of partition to %u",
(u_int)(prev_head_boundary + dos_sectors));
partp->dp_start = prev_head_boundary + dos_sectors;
 }
@@ -1434,10 +1442,7 @@
return (0);
 }
 if (adj_size != partp->dp_size) {
-   warnx(
-"WARNING: adjusting size of partition to %u to end on a\n\
-cylinder boundary",
-   (u_int)adj_size);
+   warnx("WARNING: adjusting size of partition to %u", (u_int)adj_size);
partp->dp_size = adj_size;
 }
 if (partp->dp_size == 0) {

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: fdisk(8) adjusting to head/cylinder bounderies

2001-07-21 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Brian Dean writes:
>On Sat, Jul 21, 2001 at 02:47:29PM +0100, Ian Dowse wrote:
>
>> Below is a patch that makes fdisk request user confirmation before
>> making any changes to the start and end of partitions.
>
>Please allow this behaviour to be overridden by a flag that can
>specified so that scripts don't suddenly stop and wait for input.

Sorry, I should have mentioned this; the patch only changes the
interactive case. The code to adjust the partition offsets and
sizes for config file based updates using the -f option has not
changed.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: crunched binary oddity

2001-07-26 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Peter Pentchev writes
:
>On Tue, Jul 24, 2001 at 10:14:09AM -0700, Etienne de Bruin wrote:
>> Greetings.  I crunchgen'd newfs and linked mount_mfs to it (among many other
>> progs), compiled it with success.  And yet when I boot my MFS kernel and try
>> to mount /tmp to mfs, boot_crunch complains that 'mfs' is not compiled into
>> it?
>
>Could it be that it's not boot_crunch, but the kernel complaining?
>What is the exact error message?

When mount(8) invokes a mount_xxx program, it sets argv[0] to the
name of the filesystem (ufs, mfs, nfs etc). Crunched binaries use
the argv[0] name to determine which code to execute, so you need
to add

ln mount_mfs mfs

to your crunchgen config file to get this to work. Alternatively,
just invoke mount_mfs directly instead of using mount(8).

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: vnconfig + mount removes permission for a second

2001-08-03 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, David Malone writes:
>
>When you do a mount it automatically HUP's mountd which then
>re-exports NFS filesystems. I suspect what is happening is that
>the the filesystem mountlist is being cleared for a moment and that
>is upsetting the cp.

Yes, the mountd-kernel interface for updating export lists is a
bit stupid; you have to clear all exports and then add each allowed
host/net one by one. Any NFS requests that come in after the exports
have been deleted but before the entries have been re-added will
get rejected.

See PRs misc/3980 and kern/9619 for more details. I think NetBSD
tried at one point to make mountd incrementally change the export
list, but it turned out to be quite hard to get the logic right to
keep the mountd and kernel lists in sync. I think they reverted
that change eventually.

This is certainly a bug that needs to be fixed; mountd should be
able to build up a list of all exports for a filesystem and pass
them into the kernel in one "replace export list"  operation.

Maybe nice'ing mountd to run at a higher priority, and/or specifying
only IP addresses in /etc/exports would help things a bit now.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: problems with kvm_nlist()

2001-08-03 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Tabor Kelly writes:
>Now that that is taken care of, would somebody mind explaining to me
>what n_value represents? Is it an offset in kernel memory to retrieve
>the actual data?

It is the kernel virtual address of the symbol that you specified
in n_name, which will be the same as an in-kernel pointer value
(e.g. something like 0xc0123456). This address has no meaning in
userland, but libkvm provides a kvm_read() function that does all
the magic necessary to read from the kernel memory at this address.

There are lots of examples of code using the libkvm interface in
the FreeBSD source tree (fstat, ps, vmstat, pstat etc.) although
many of these now use sysctl to retrieve values instead. Briefly,
you just kvm_read the value of the variable whose symbol address
you have found, e.g. something like the code below, but you'll want
to add code to deal with any errors that the kvm_* calls might
return.

struct nlist nl[] = {
{"nextpid"},
{NULL},
};
int nextpid;

kd = kvm_openfiles(...);
kvm_nlist(kd, nl);
kvm_read(kd, nl[0].n_value, &nextpid, sizeof(nextpid));
printf("nextpid is %d\n", nextpid);

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Reading files within the kernel (was Re: allocating userland space...)

2001-08-13 Thread Ian Dowse

In message <003101c12411$294adaa0$[EMAIL PROTECTED]>, Sansonetti Laurent w
rites:
>Hello hackers,
>I'm currently working on a kld syscall module which needs to read a config
>file at startup (MOD_LOAD).
>Following the advice of Eugene L. Vorokov, I tried to allocate some userland
>space with mmap() to store a open_args struct, fill-it with copyout() /
>subyte()... and call open with curproc on first argument.

I really don't understand why people try these obscure mechanisms
to read files within the kernel. There are existing kernel interfaces
for accessing files that are much cleaner than these hacks. You
can't use the familiour open/read/close calls, but using the vnode
interface is really not that hard.

Below is a simple KLD that prints /etc/motd on the console. There's
not a lot involved really, since vn_open(), vn_rdwr() and vn_close()
do most of the hard bits. The most strange stuff is probably the
setting up of the nameidata structure, but even it isn't too
complicated.

To try it, just save the two files below in a directory, and run

make depend
make
kldload ./kernio.ko

(WARNING: not highly tested, so it may crash your machine!)

For further reference, most of the VOP_* functions are documented
in section 9 man pages.

Ian

 Makefile --
KLDMOD= true
KMOD=   kernio
SRCS=   vnode_if.h kernio.c

NOMAN=
CFLAGS+= -I${.CURDIR}/.. -I/usr/src/sys

.include 

- kernio.c --
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static int kernio_example(void);
static int kernio_open(int pathseg, const char *path, int flags,
struct proc *p, struct vnode **vpp);
static void kernio_close(struct vnode *vp, int flags, struct proc *p);

static int
kernio_modevent(module_t mod, int type, void *unused) {
switch (type) {
case MOD_LOAD:
return kernio_example();

case MOD_UNLOAD:
break;
default:
break;
}
return 0;
}

static int
kernio_example(void) {
struct vattr vattr;
struct proc *p;
struct vnode *vp;
char *buf, *cp;
int error, filesize, flags, pos, resid;

p = curproc;
flags = FREAD;
buf = NULL;

/* Open the file, and get its size. */
error = kernio_open(UIO_SYSSPACE, "/etc/motd", flags, p, &vp);
if (error)
return (error);
error = VOP_GETATTR(vp, &vattr, p->p_ucred, p);
if (error)
goto errout;
filesize = vattr.va_size;
printf("file size = %d\n", filesize);

/* Allocate space for the file contents. */
MALLOC(buf, char *, filesize, M_TEMP, M_WAITOK);
if (buf == NULL)
goto errout;

/* Read in the complete file to `buf'. */
error = vn_rdwr(UIO_READ, vp, buf, filesize, 0, UIO_SYSSPACE,
IO_NODELOCKED, p->p_ucred, &resid, p);
if (error)
goto errout;

/* Silly example; print out the file line by line. */
cp = buf;
for (pos = 0; pos < filesize; pos++) {
if (buf[pos] != '\n')
continue;

buf[pos] = '\0';
printf("%s\n", cp);
cp = &buf[pos] + 1;
}

errout:
if (buf != NULL)
FREE(buf, M_TEMP);
kernio_close(vp, flags, p);
return (error);
}


static int
kernio_open(int pathseg, const char *path, int flags, struct proc *p,
struct vnode **vpp)
{
struct nameidata nd;
struct vnode *vp;
int error;

NDINIT(&nd, LOOKUP, FOLLOW, pathseg, path, p);
#if __FreeBSD_version < 50
error = vn_open(&nd, flags, 0);
#else
error = vn_open(&nd, &flags, 0);
#endif
if (error)
return (error);
NDFREE(&nd, NDF_ONLY_PNBUF);
vp = nd.ni_vp;
if (vp->v_type != VREG) {
VOP_UNLOCK(vp, 0, p);
vn_close(vp, flags, p->p_ucred, p);
return (EACCES);
}
*vpp = vp;
return (0);
}

static void
kernio_close(struct vnode *vp, int flags, struct proc *p)
{
VOP_UNLOCK(vp, 0, p);
vn_close(vp, flags, p->p_ucred, p);
}


moduledata_t kernio_mod = {
"kernio",
kernio_modevent,
0
};
DECLARE_MODULE(kernio, kernio_mod, SI_SUB_DRIVERS, SI_ORDER_ANY);


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Reading files within the kernel (was Re: allocating userland space...)

2001-08-14 Thread Ian Dowse

In message <003401c1244d$1fa6ee80$[EMAIL PROTECTED]>, Sansonetti Laurent w
rites:
>A another stupid question, how can I do to stop the loading process in
>MOD_LOAD event handler (in my case, if the cfg file doesn't exist, it should
>be better to interrupt..) ?

Someone else might a have better idea of how this works, but it
seems to me that the best you can do is printf a descriptive error
message and return a non-zero value from the module event handler
function. The return code from the event handler will be printed
on the console by the kernel, and the event handler will then
immediately be called with MOD_UNLOAD.

It seems that the KLD is not actually unloaded in this case, and
no error is returned to the kldload process, but the user can
then manually unload the KLD, correct the problem and try again.

That's just from a quick read of the code so it may be wrong. Try
adding printf's to the MOD_LOAD and MOD_UNLOAD cases in the event
handler, and see what happens when MOD_LOAD returns non-zero.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Serious i386 interrupt mask bug in RELENG_4 (was Re: 4.4-RC NFS panic)

2001-08-23 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Warner Losh writes:
>
>I think that might be due to a bug in the shared interrupt code that
>Ian Dowse sent me about earlier today.

Just to add a few details - there is a bug in the update_masks()
function in i386/isa/intr_machdep.c that can cause some interrupts
to occur at times when they should be masked. The problem only
occurs with certain configurations of shared interrupts and devices,
and this code is only present in RELENG_4.

The update_masks() function is called after an interrupt handler
has been registered or removed. Its main function is to update the
interrupt masks (tty_imask, net_imask etc) if necessary (e.g if
IRQ11 is registered by a tty-type device, IRQ11 will be added to
tty_imask so that future spltty()'s will mask IRQ11).

A second function of update_masks() is to update the cached copy
of the interrupt mask stored with each handler for a multiplexed
interrupt. This is done via the call to update_mux_masks().

The bug is that update_masks() returns without calling update_mux_masks()
in some cases where it should call it. Specifically, if a newly-added
multiplexed interrupt handler has the same maskptr as another
handler on the same IRQ line, that new handler doesn't get it's
cached mask set. For example if a single IRQ has a usb device and
a modem (tty), the second device to register it's handler will get
its idesc->mask set to 0 instead of the value of tty_imask because
update_mux_masks() may never be called to set it. Of course, if
update_masks() is called later for some other device it may correct
the situation.

Interrupt handlers are called with intr_mask[irq] or'd into the
cpl to block further interrupts; for non-multiplexed interrupts
intr_mask[irq] will set from one of the *_imask masks. However with
multiplexed interrupts, only the IRQ itself (and SWI_CLOCK_MASK)
are blocked, and the multiplex handler intr_mux() needs to raise
the cpl further when necessary. It uses idesc->mask to control
this.

When this bug occurs, idesc->mask == 0, so the device interrupt
handler gets called with only the IRQ and SWI_CLOCK_MASK masked,
instead of the full *_mask that it requested. Not good.

On my laptop, this bug causes hangs within minutes of starting to
use a pccard modem, but as should be apparent from the above it
could strike virtually anywhere that multiplexed interrupts are
used. The patch below seems to solve the problem; it just causes
update_masks() to unconditionally update the masks.

Ian


Index: intr_machdep.c
===
RCS file: /home/iedowse/CVS/src/sys/i386/isa/intr_machdep.c,v
retrieving revision 1.29.2.2
diff -u -r1.29.2.2 intr_machdep.c
--- intr_machdep.c  2000/08/16 05:35:34 1.29.2.2
+++ intr_machdep.c  2001/08/23 20:24:17
@@ -651,15 +651,9 @@
 
if (find_idesc(maskptr, irq) == NULL) {
/* no reference to this maskptr was found in this irq's chain */
-   if ((*maskptr & mask) == 0)
-   return;
-   /* the irq was included in the classes mask, remove it */
*maskptr &= ~mask;
} else {
/* a reference to this maskptr was found in this irq's chain */
-   if ((*maskptr & mask) != 0)
-   return;
-   /* put the irq into the classes mask */
*maskptr |= mask;
}
/* we need to update all values in the intr_mask[irq] array */


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: VM Corruption - stumped, anyone have any ideas?

2001-09-24 Thread Ian Dowse

>
>The pointers in the last few entries of the vm_page_buckets array got
>corrupted when an agument to a function that manipulated whatever was next
>in ram was 0, and it turned out that it was 0 because
> of some PTE flushing thing (you are the one that found it... remember?)

I think I've also seen a few reports of programs exiting with
"Profiling timer expired" messages with 4.4. These can be caused
by stack overflows, since the p_timer[] array in struct pstats is
one of the things that I think lives below the per-process kernel
stack. I wonder if they are related? Stack overflows could result
in corruption of local variables, after which anything could happen.

That said, hardware problems are still a possiblilty.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: VM Corruption - stumped, anyone have any ideas?

2001-09-24 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Matt Dillon writes:
>
>Hmm.  Do we have a guard page at the base of the per process kernel
>stack?

As I understand it, no. In RELENG_4 there are UPAGES (== 2 on i386)
pages of per-process kernel state at p->p_addr. The stack grows
down from the top, and struct user (sys/user.h) sits at the bottom.
According to the comment in the definition of struct user, only
the first three items in struct user are valid in normal running
conditions:

8192
???
8176p_addr

So if the stack does overflow, p_timer[ITIMER_PROF] is about the
first noticable thing that gets clobbered, causing a SIGPROF
signal delivery to the process some time later.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: bleh. Re: ufs_rename panic

2001-10-02 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Matt Dillon writes:
>What I've done is add a SOFTLOCKLEAF capability to namei().  If set, and
>the file/directory exists, namei() will generate an extra VREF() on 
>the vnode and set the VSOFTLOCK flag in vp->v_flag.  If the vnode already
>has VSOFTLOCK set, namei() will return EINVAL.

I just tried a more direct approach, which is to implement a flag
at the vnode layer that is roughly equivalent to UFS's IN_RENAME
flag. This keeps the changes local to vfs_syscalls.c except for
the addition of a new vnode flag in vnode.h.

A patch is below. It doesn't include the changes to remove IN_RENAME
etc, but these could be done later anyway.

The basic idea is that the rename syscall locks the source node
just for long enough to mark it with VRENAME. It then keeps an
extra reference on the source node so that it can clear VRENAME
before returning. The syscalls unlink(), rmdir() and rename() also
check for VRENAME before proceeding with the operation, and act
appropriately if it is found set. One case that is not being handled
well is where the target of a rename has VRENAME set; the patch just
causes rename to return EINVAL, but a better approach would be to
unlock everything and try again. I don't know how to deal with the
case of vn_lock(fvp, ...) failing at the end of rename() either.

Only lightly tested, so expect lots of bugs...

Ian

Index: sys/vnode.h
===
RCS file: /dump/FreeBSD-CVS/src/sys/sys/vnode.h,v
retrieving revision 1.157
diff -u -r1.157 vnode.h
--- sys/vnode.h 13 Sep 2001 22:52:42 -  1.157
+++ sys/vnode.h 2 Oct 2001 19:06:41 -
@@ -163,8 +163,8 @@
 #defineVXLOCK  0x00100 /* vnode is locked to change underlying type */
 #defineVXWANT  0x00200 /* thread is waiting for vnode */
 #defineVBWAIT  0x00400 /* waiting for output to complete */
+#defineVRENAME 0x00800 /* rename operation on progress */
 #defineVNOSYNC 0x01000 /* unlinked, stop syncing */
-/* open for business0x01000 */
 #defineVOBJBUF 0x02000 /* Allocate buffers in VM object */
 #defineVCOPYONWRITE0x04000 /* vnode is doing copy-on-write */
 #defineVAGE0x08000 /* Insert vnode at head of free list */
Index: kern/vfs_syscalls.c
===
RCS file: /dump/FreeBSD-CVS/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.206
diff -u -r1.206 vfs_syscalls.c
--- kern/vfs_syscalls.c 22 Sep 2001 03:07:41 -  1.206
+++ kern/vfs_syscalls.c 2 Oct 2001 20:29:54 -
@@ -1573,6 +1573,9 @@
if (vp->v_flag & VROOT)
error = EBUSY;
}
+   /* Claim that the node is already gone if it is being renamed. */
+   if (vp->v_flag & VRENAME)
+   error = ENOENT;
if (vn_start_write(nd.ni_dvp, &mp, V_NOWAIT) != 0) {
NDFREE(&nd, NDF_ONLY_PNBUF);
vrele(vp);
@@ -2879,20 +2882,29 @@
struct mount *mp;
struct vnode *tvp, *fvp, *tdvp;
struct nameidata fromnd, tond;
-   int error;
+   int err1, error;
 
bwillwrite();
-   NDINIT(&fromnd, DELETE, WANTPARENT | SAVESTART, UIO_USERSPACE,
-   SCARG(uap, from), td);
+   NDINIT(&fromnd, DELETE, WANTPARENT | LOCKLEAF | SAVESTART,
+   UIO_USERSPACE, SCARG(uap, from), td);
if ((error = namei(&fromnd)) != 0)
return (error);
fvp = fromnd.ni_vp;
-   if ((error = vn_start_write(fvp, &mp, V_WAIT | PCATCH)) != 0) {
+   if (fvp->v_flag & VRENAME)
+   /* The node is being renamed; claim it has already gone. */
+   error = ENOENT;
+   if (!error)
+   error = vn_start_write(fvp, &mp, V_WAIT | PCATCH);
+   if (error) {
NDFREE(&fromnd, NDF_ONLY_PNBUF);
vrele(fromnd.ni_dvp);
-   vrele(fvp);
+   vput(fvp);
+   fvp = NULL;
goto out1;
}
+   fvp->v_flag |= VRENAME;
+   vref(fvp);
+   VOP_UNLOCK(fvp, 0, td);
NDINIT(&tond, RENAME, LOCKPARENT | LOCKLEAF | NOCACHE | SAVESTART | NOOBJ,
UIO_USERSPACE, SCARG(uap, to), td);
if (fromnd.ni_vp->v_type == VDIR)
@@ -2929,6 +2941,10 @@
!bcmp(fromnd.ni_cnd.cn_nameptr, tond.ni_cnd.cn_nameptr,
  fromnd.ni_cnd.cn_namelen))
error = -1;
+   if (tvp != NULL && (tvp->v_flag & VRENAME)) {
+   /* XXX, should just unlock everything and retry. */
+   error = EINVAL;
+   }
 out:
if (!error) {
VOP_LEASE(tdvp, td, td->td_proc->p_ucred, LEASE_WRITE);
@@ -2961,6 +2977,18 @@
ASSERT_VOP_UNLOCKED(tond.ni_dvp, "rename");
ASSERT_VOP_UNLOCKED(tond.ni_vp, "rename");
 out1:
+   if (fvp != NULL) {
+   /* We set the VRENAME flag a

Re: patch #3 (was Re: bleh. Re: ufs_rename panic)

2001-10-03 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Matt Dillon writes:
>
>:This seems rather large compared to Ian Dowse's version..  Are you sure that
>:you're doing this the right way?  Adding a whole new locking mechanism
>:when the simple VRENAME flag to be enough seems like a bit of overkill..

Matt addresses the problem more completely than my patch does, so
the differences in patch size and files touched are to be expected.
In particular, the NFS server and unionfs code need to be changed
in the same way as the syscalls, and the IN_RENAME flag can be
removed from the ufs code, both of which are included in Matt's
patch.

>Ian's doesn't fix any of the filesystem semantics bugs, it only prevents
>the panic from occuring.

This is certainly correct, though the IN_RENAME flag in the UFS
code currently has a few such semantics bugs where EINVAL can be
returned in cases that would succeed if rename() was atomic. When
a vnode cannot be renamed/unlinked/rmdir'd because it is being
renamed, the operation should be retried until it succeeds, sleeping
as necessary. As I understand it, this is mostly dealt with by
Matt's patch, but not at all by mine.

>If you remove the filesystem semantics fixes from my patch you 
>essentially get Ian's patch except that I integrated the vnode flag
>in namei/lookup whereas Ian handles it manually in the syscall code.

The addition of the SOFTLOCKLEAF code is quite a major change, so
it would be very useful if you could describe exactly what it does,
what its semantics are, and how it fits into the rename problem.

My understanding of the problem is that VOP_RENAME is quite unique
in that it is the only VOP that must modify entries in two separate
directories. To avoid deadlock, it is not possible (very hard anyway)
to lock all 4 vnodes (source node, source parent, target node,
target parent) before calling VOP_RENAME. Instead, the approach
taken is to lock only the target node and its parent, and have the
VOP_RENAME implementation jump back and forth between locking the
source and locking the target as necessary. Hence VOP_RENAME is
the only VOP that must modify a node that is passed in unlocked.

Because the source node and parent are not locked, there is the
possibility that the source node could be renamed or removed at
any time before VOP_RENAME finally gets around to locking it and
removing it. Something needs to protect the source node against
being renamed/removed between the point that the source node is
initially looked up and the point that it is finally locked. Both
Matt's SOFTLOCKLEAF and the VRENAME flag are there to provide this
protection.

It is the fact that this problem is entirely unique to VOP_RENAME
that leads me to think that adding the generic SOFTLOCKLEAF code
is overkill. The following fragment also suggests that maybe the
approach doesn't actually fit in that well:

fromnd.ni_cnd.cn_flags &= ~SOFTLOCKLEAF;/* XXX hack */
error = VOP_RENAME(fromnd.ni_dvp, fromnd.ni_vp, &fromnd.ni_cnd,
tond.ni_dvp, tond.ni_vp, &tond.ni_cnd);
fromnd.ni_cnd.cn_flags |= SOFTLOCKLEAF;
NDFREE(&fromnd, NDF_ONLY_PNBUF & NDF_ONLY_SOFTLOCKLEAF);

The way that vclearsoftlock() is used to clear a flag in an unlocked
vnode is also not ideal. This should probably be protected at least
by v_interlock as other flags are.


The syscalls that need to be changed (rename, unlink, rmdir) could
possibly use vn_* style wrapper functions to reduce the amount of
code that must understand the new locking mechanism, although I'm
not sure if this is practical for the NFS case. It might also be
a good time to remove the WILLRELE from VOP_RENAME, which would
simplify some of the surrounding code.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: problems with recurring SIGTRAP under gdb

2001-10-29 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, k Macy writes:
>Any idea why when I insert a breakpoint I get a
>SIGTRAP 
>and can't continue any further? Is this a bug in the 

I've seen this on applications that use SIGIO on stdin. If this is the
case, a workaround is to disable the SIGIO signal while using the
debugger, e.g:

(gdb) set $oldsigio = signal(23, (void *)1)

The signal handler can be put back later with:

call signal(23, (void *)$oldsigio)

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: FreeBSD on vmware

2001-11-13 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Robert Watson writ
es:
>I've had -STABLE run fine, but of late have had a lot of trouble with
>-current.  Userland processes during the boot sequence seem to spend a lot
>of time just spinning -- it's not clear to me what the cause is, and I
>haven't had time to debug.

Someone mentioned on a list somewhere that vmware takes forever to
emulate the cmpxchg instruction, and that using the I386_CPU version
of atomic_cmpset_int() helps a lot. I noticed a major vmware slowdown
with -current sometime in September, so I tried avoiding the
cmpxchg's and things got much faster. Below is the patch I use
(using this outside vmware on SMP hardware is a bad idea :-).

Ian

Index: atomic.h
===
RCS file: /dump/FreeBSD-CVS/src/sys/i386/include/atomic.h,v
retrieving revision 1.21
diff -u -r1.21 atomic.h
--- atomic.h2001/10/08 20:58:24 1.21
+++ atomic.h2001/10/09 18:35:25
@@ -111,7 +111,7 @@
  * Returns 0 on failure, non-zero on success
  */
 
-#if defined(I386_CPU)
+#if defined(I386_CPU) || 1
 static __inline int
 atomic_cmpset_int(volatile u_int *dst, u_int exp, u_int src)
 {

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: hot swap with ugen

2001-11-21 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Srinivas Dharmasanam writ
es:
>Hi,
>I'm using the generic usb device drive ugen for controlling a USB device. 
>The problem is I'm having to reboot the computer each time I 
>disconnect/connect the device in order for FreeBSD to see the USB device.

Are you running usbd (usbd_enable="YES" in /etc/rc.conf)?

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: fujitsu MO drive: DA_Q_NO_SYNC_CACHE

2001-11-24 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, "W.Scholten" writes:
>I submitted a bugreport & patch for 3.3 /4.1 a year ago, but on
>installing 4.4 a while back, I found it had not been incorporated.

It's in -current and -stable now. Sorry for the delay.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: FreeBSD on vmware

2001-11-28 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Makoto Matsushita writes:
>I really know I'm doing a stupid thing, but here is benchmark results
>of both "plain" and "patched" 5-current (as of Nov/26/2001).  Patched
>FreeBSD is about 10% faster than before.

... but only if you spend most of your time running CPU benchmarks :-)
Your results show a 50-100% speed increase for operations requiring
a lot of kernel activity. Remember also that interrupts etc. cause
a background rate of cmpxchg instructions that is quite high. On
slower CPUs (I was using a 400MHz PII), the interrupts can soak up
virtually all of the available processing capacity without the
patch. I suspect this effect is responsible for the most dramatic
speedups.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: bin/32261: dump creates a dump file much larger than sum of dumped files

2001-12-04 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Bernd Walter writes:
>> Is there any reason we don't want to truncate the file? Does O_TRUNC
>> not work well of the file is a tape device or something?
>
>I don't expect O_TRUNK to work on devices such tapes and disks.

Well, it won't achieve anything on tapes or disk devices, but it
should be completely harmless to add the O_TRUNC flag. The current
behaviour is likely to be unexpected and cause confusion so it
might as well be changed. I'll commit this later unless someone
can think of a good reason not to.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: bin/32261: dump creates a dump file much larger than sum of dumped files

2001-12-04 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Matthew Dillon wri
tes:
>Woa!  That sounds like a bad idea to me.  If you want to do it right
>then open(), fstat(), and only if the stat says it is a regular file
>do you then ftruncate().  Passing O_TRUNC to a tape device may be ignored
>by us, but it's not a valid flag to pass to a tape device and we shouldn't
>do it.

Yeah, I guess checking the file type first makes more sense. I tend
to use shell `>' redirects a lot when accessing tape devices. They
unconditionally add O_TRUNC, so I know I'd be very surprised if
there were side-effects! However for dump I agree that it's best
not to make such assumptions.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: switching to real mode

2001-12-06 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, John Baldwin writes:
>The short form is htat you need to hack the cpu_halt to call a function that
>puts a stub down in low memory, and calls it.  This code needs to be mapped 1:1
>so that the logical address == physical address.  The first thing you will

Yeah, I attempted something like this a few years ago without much
success. I've just updated the code to compile on -stable, and it
seems to half-work in that it appears to successfully switch to
real mode and clear the screen using the video BIOS, but then it
just hangs. That's pretty close to what I remember it doing
originally, although I think it might have worked before the VM86
stuff was enabled by default in FreeBSD. Getting this sort of code
to work reliably is almost impossible... Source is at

http://www.maths.tcd.ie/~iedowse/FreeBSD/diskboot/

(loading the resulting KLD immediately shuts down to real mode).

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Hyperthreading slowdown

2003-10-04 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Kris Kennaway writes:
>Yes, that's because (as discussed in the archives) the kernel treats
>it like an extra, completely decoupled physical CPU and schedules
>processes on it without further consideration.  This is presumably the
>cause of the slowdown, because it's only efficient to use the virtual
>CPU under certain workload patterns.  HTT is not magic performance
>beans.

Try also setting the sysctl variable "machdep.cpu_idle_hlt" to 1, as
it doesn't help to have the idle logical CPUs spinning.

Ian
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Need review of NFS patch set for server .. missing/wrong vput() issues

2002-01-12 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Matthew Dillon wri
tes:
>Patch section 1
>
>   Here we were previously vput()ing nd.ni_vp only if error == 0.
>   If error is returned non-zero from namei() this would normally be
>   correct.  However, we force error on a number of occassions after
>   namei() succeeds, in which case nd.ni_vp may be non-NULL and we
>   must release it.  This fixes it so nd.ni_vp is vput()'d if it is
>   non-NULL whether an error is specified at this point or not.

I don't think this is necessary, because the cleanup code at the
end of nfsrv_mknod() catches any cases where nd.ni_vp was not
released earlier. It would be harmless to add it though.

>   (I believe this may have been Alexey's 'NFS hangs in inode state'
>   problem, which occurs if you are running innd over an NFS filesystem)

Was that a client-side or server-side issue?

>Patch section's 2 & 3
>
>   Here namei() is called only with LOCKPARENT, which means that the
>   leaf is not locked.  So when releasing the vnodes we should not
>   have the if (vp == dvp) test, we should just vput() the dvp and
>   vrele the vp.

Hmm, it seems that lookup() doesn't actually leave the parent locked
in this case (it probably should), so I think the existing code is
correct in that distorted sense of `correct'. The exit code in
lookup() is:

if ((cnp->cn_flags & LOCKLEAF) == 0)
VOP_UNLOCK(dp, 0, td);
return (0);

I tried reproducing the vp == dvp case in nfsrv_link by attempting
to create a link called `/somedir/.' to an existing regular file
(I did this at the protocol level; I'm not sure if you can do this
easily from a normal client). Instrumentation confirmed that the
code in question does get executed with vp == dvp, but I saw no
problems or panics either with or without your patch (!). It seems
we don't have any VFS locking assertions compiled in even with
INVARIANTS... When I added some assertions, your patch triggered my
"vput: vnode not locked" error as soon as the weird link operation
was repeated, but the existing code works fine.

We really need some basic locking assertions such as checking that
a vnode is locked when you vput it, and checking that it isn't
locked when the last reference is vrele'd. This is complicated by
the fact that we have at least 3 different types of vnode locking:
vop_stdlock (ufs etc), vop_sharedlock (nfs), and vop_nolock (devfs,
procfs etc). Maybe a VOP_LOCKASSERT would help, because VOP_ISLOCKED
isn't useful for vop_nolock filesystems. Note that there are the
`options DEBUG_VFS_LOCKS' assertions, but these are used in ways
that can result in false positives.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Need review of NFS patch set for server .. missing/wrong vput() issues

2002-01-12 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Ian Dowse writes:
>I don't think this is necessary, because the cleanup code at the
>end of nfsrv_mknod() catches any cases where nd.ni_vp was not
>released earlier. It would be harmless to add it though.

Oops, I missed a 'return (0);' when reading the code. You're quite
correct here; the first part of the patch looks correct, and could
certainly cause vput's to be forgotten. I'll try to reproduce this
now.

It's just the vp == dvp stuff that is ok as it is.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Need review of NFS patch set for server .. missing/wrong vput() issues

2002-01-12 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Matthew Dillon wri
tes:
>Ok, cool.  I'll get the commit gears started for the 
>first part of the patch.

FYI, I was able to reproduce this and confirm that the first part
of your patch fixes it. All that it takes is for the mknod to fail
because the name already exists, but normally this is masked by the
client because it does an NFSPROC_ACCESS RPC first.

Another nasty bug in nfsrv_mknod that I just spotted is that it
doesn't override the S_IFMT bits of the file mode supplied by the
client. It should be completely ignoring those bits, and using only
the node-type it has in the `vtyp' variable. I just managed to
create a node that makes ls say "Bad file descriptor" by passing
in a type of NFFIFO and a mode of 0...

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Need review of NFS patch set for server .. missing/wrong vput() issues

2002-01-16 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Matthew Dillon wri
tes:
>NFS fix).  I think Ian's mknod tests are a no-brainer.  They should
>just go in, as should my mknod fix.

I agree here - Matt's mknod fix and the S_IFMT mode bits corruption
bug that I fixed are simple fixes and they are both effectively
remotely exploitable (but only if you are running an NFS server,
and generally only by hosts listed in /etc/exports). The first bug
causes all processes to get stuck in state `inode', and the second
causes filesystem corruption that requires a manual fsck to fix.
Matt's mknod bug occurred during normal operation, but the other
probably only happens with a hostile client.

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/nfsserver/nfs_serv.c
mknod bug: revision 1.114
S_IFMT bug: revision 1.113

>#1 Fix corruption that can occur if a RW mount is downgraded to RO
>#2 Fix spl confusion that can occcur in ACQUIRE_LOCK*() softupdates
>  routines 
>#3 Fix softupdates panic that can occur during heavy I/O
>  (see 'drain_output' calls in patch below)
>
>I have included Kirk's patch (for stable) below for review.  It's a bit
>messy so I will note that the most important fix is #3 above, and it is
>a very simple and tiny portion of the below patch.

I'm not so sure about these. #3 looks simple on its own I suppose.
#1 has been around for years, and although annoying, the corruption
is simply that some blocks don't get freed until the next real fsck.
This fix was only committed to -current yesterday, and it has already
caused one problem there, so it's not looking too good from a gain
vs. risk POV :-) I'm not sure about #2 either; the patch isn't too
complex, but it's a bit strange.

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/ufs/ffs/ffs_softdep.c
#2: 1.104
#3: 1.103

BTW is the VDRAINED stuff in your patch just left over from something
else? It doesn't seem to be present in -current.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Porting a userland NFS server

2002-03-14 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Daniel O'Connor writ
es:
>I end up with EFBIG when trying to read the .katie-server-info file, but
>if I create a file inside the view (eg echo "abc" >foo) then it can be
>read  with no problem, _but_ the dump of NFS traffic doesn't show a read
>for that file.

At a guess, the server is incorrectly reporting the maximum file
size. You might be able to verify this by creating a file of the
same size as .katie-server-info and checking if you get the same
error. The bug in the server is likely to be in its "fsinfo" op
function - see the FSINFO3resok definition in RFC1813 for how the
fsinfo reply is supposed to be formed.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: mmap and efence

2002-03-19 Thread Ian Dowse

In message , Kip Macy writes:
>Looking at the source for efence this happens when mmap fails (in this case wi
>th
>ENOMEM). Looking at the man page the two possibilities are: the system has
>reached the per-process mmap limit specified in the vm.max_proc_mmap sysctl or
> 
>insufficient memory was available. *BSD limits the maximum amount of memory th
>at
>a process can mmap to swap+physical.

I've also found it useful to increase the value of MEMORY_CREATION_SIZE
in the ElectricFence source. Setting this to larger than the amount
of address space ever used by the program seems to avoid the
vm.max_proc_mmap limit; maybe when ElectricFence calls mprotect()
to divide up its allocated address space, each part of the split
region is counted as a separate mmap.

I came across this before while debugging perl-Tk, and one other
issue was that the program ran fantastically slowly; a trivial
script that normally starts in a fraction of a second was taking
close to an hour to get there on quite fast hardware. You expect
ElectricFence to make things slow, but not quite that slow :-)

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: kernel backtrace of sleeping processes

2002-04-22 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Robe
rt Watson writes:
>Sigh.  Remote gdb, not ddb.  I tried the usual tricks (updating $sp in
>gdb, etc) but gdb persisted in using the old frame.  Nevermind.  It seemed

In gdb, the "proc" command switches processes, so this should work:

proc 
bt

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: CPU context switching/load numbers

2002-05-02 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Jason 
Borkowsky writes:
>1. How is it my load average is over 1, but my single CPU is 85% idle?

This is quite possible due to process synchronisation, since there
is no direct relationship between the load average and the percentage
of time that the CPU is idle. The load average is a measure of the
average number of processes that are in the "runnable" state, but
obviously on a single-CPU machine, only one of them can actually
be running at a time.

As an example, consider the case where 2 processes are each "runnable"
50% of the time, but the times are synchronised. Half of the time
there are 2 runnable processes, and the other half of the time there
are no runnable processes. The load average will be 1.0 since the
average number of runnable processes is 1, but there are no processes
running half of the time, so the CPU is 50% idle.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: /usr/src/sys/kern/kern_sig.c

2002-05-06 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Marc Olzheim writes:,
Marco van de Voort writes:
>While working on tha FreePascal FreeBSD port, we found a bug in the
>kernel source, that has been fixed in -CURRENT...
>Any reason why pathes 1.137 and 1.148 of kern_sig.c have not yet been
>committed to RELENG_4 ?

Are these really the revisions you mean? 1.137 is completely harmless,
and 1.48 is limited to the case where you define the undocumented
option "COMPAT_SUNOS".

Ian

REV:1.148   kern_sig.c  2002/02/15 03:54:01   bde

   Fixed a typo in rev.1.65 that gave a reference to a nonexistent variable.
   This was not detected by LINT because LINT is missing COMPAT_SUNOS.

REV:1.137   kern_sig.c  2001/10/07 16:11:37   iedowse

   Fix a typo in do_sigaction() where sa_sigaction and sa_handler were
   confused. Since sa_sigaction and sa_handler alias each other in a
   union, the bug was completely harmless. This had been fixed as part
   of the SIGCHLD changes in revision 1.125, but it was reverted when
   they were backed out in revision 1.126.




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: can't mount cdrom 4.6-RELEASE

2002-06-20 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, jogegabsd wr
ites:
>I just upgrade to 4.6-RELEASE.
...
># mount_cd9660 /dev/acd0c /cdrom
>/dev/acd0c: Device not configured

What way did you upgrade? The device minor number for acdXc changed
between 4.5 and 4.6, so you need to ensure that you have an up-to-date
/dev/MAKEDEV as well as re-running "sh MAKEDEV acd0". If you did a
buildworld, you probably forgot the mergemaster step or did it in
the wrong order.

The output of "ls -l /dev/acd0c" should look something like:

crw-r-  4 root  operator  117,   0 Apr 27 20:24 /dev/acd0c

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: How does swap work address spacewise?

2002-07-06 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Bernd Walter writes:
>I never saw any negative block numbers in on-disc structures.
>Now I wonder if it was just hidden behind macros.
>What is the reason to handle it that way?
>Do you have some code reference for homework?

These logical block numbers are not stored on disk; they are just
used by the filesystem code to refer to block numbers within a file
relative to the start of the file. The on-disk format uses direct
and indirect block pointers to refer to the actual filesystem blocks,
and it is easy to get from a lbn to the sequence of indirection
blocks necessary to find the on-disk data. See ufs_getlbns() in
sys/ufs/ufs/ufs_bmap.c for details.

>> These are logical block numbers, which are fragment-sized (1K typically)

(lbns are actually in block-sized, not fragment-sized units, since
a single file block is always contiguous on the disk even if it
does not begin on a disk block boundary or is not a full block in
size. Physical UFS block numbers (ufs_daddr_t in the code) are in
fragment-sized units.)

>> Physical block numbers are 512-byte sized, with a range of 2^32
>> in -stable.  This also winds up being 2TB.  So increasing the fragment
>> size does not help in -stable.
>It's a proven fact that there is a 1T limit somewhere which was
>explained with physical block numbers beeing signed.

Yes, the daddr_t type is signed, so the real limit for filesystems is
1TB I think.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: kevent and pipes interaction on 4.6-STABLE

2002-08-26 Thread Ian Dowse

In message <20020826225851.GA93947@gallium>, Dominic Marks writes:
>+static int kq = -1;
>+int kq, rv, idx;
>kevent(0x3,0xbfbfedbc,0x1,0x0,0x0,0x0)   = 0 (0x0)
>kevent(0x809abc0,0x0,0x0,0xbfbfede0,0x8,0x0) ERR#9 'Bad file descriptor'

Look at the above 4 lines, and it is pretty clear what is going on.
You don't want to hide the global `kq' behind an uninitialised local
variable of the same name.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: USB->ATA devices

2002-08-27 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Soeren Schmidt writes:
>It should be possible to hide the USB stuff under the ATA_* macroes
>or even just under bus_space_*.
>I need a bit more concrete details on how to call into the USB
>code, then it should be pretty easy to add...

This would be hard to do right, as the preferred way to talk to USB
devices is with a request-callback model. The ATA command would
need to be put into a request structure and handed to the USB device
driver, and the USB driver would then call back when the request
completes. There are hacks that can be used to perform the USB
operations synchronously, but they generally do not handle unexpected
removal of the device well at all.

There are many possible ATA/ATAPI over USB protocols, so turning
the ATA request into one or more USB transfers is a bridge-specific
operation. Basically these odd protocols exist because the manufacturers
of the various bridges have decided to cut corners and not implement
the standard USB mass storage interface.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: vmware reads disk on non-sector boundary

2002-10-03 Thread Ian Dowse

In message , Garance A Drosihn writes:
>I also have a partition with freebsd-current from two or three days
>ago, and all the latest versions of the ports.  Every time I try to
>start vmware2 on the newer system, the hardware dies.  Sometimes it
>automatically reboots, other times it freezes up and I have to
>force-reboot it (sometimes by unplugging it from the wall).

See the patch I posted in:


http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=0+6285+/usr/local/www/db/text/2002/freebsd-emulation/20020908.freebsd-emulation

There may still be further issues, but it allowed me to use vmware2
on a current from a week or two ago.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: vmware reads disk on non-sector boundary

2002-10-03 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Mark Santcroos writes:
>On Thu, Oct 03, 2002 at 09:04:04AM +0100, Ian Dowse wrote:
>> There may still be further issues, but it allowed me to use vmware2
>> on a current from a week or two ago.
>
>That's only for virtual disks, and that is not where the problem is (was).
>For most people this is not a solution.

True, it won't fix the problems you reported with raw disks, but
it stops vmware from instantly panicking on recent -currents and
that is the first problem you will encounter with the port.

I tend to run vmware either diskless or with virtual disks, so I
wouldn't notice the raw disk issues.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



gdb support for kernel modules

2002-10-07 Thread Ian Dowse


This is something I have been meaning to investigate for a while:
when gdb encounters a userland executable that uses shared libraries
it automatically adds the symbols from each library, so it seemed
likely that gdb could be made do the same thing with kernel modules.
I am aware of the existence of `gdbmods' etc, but it would be nicer
to have the support built in to gdb.

Anyway, below is a proof-of-concept patch that does the basics, but
among other things, its logic for locating the kernel module files
needs a lot of work - currently it just assumes /boot/kernel/,
which is almost never what you actually want. It works for debugging
vmcores and live /dev/mem access, but I don't know if it can work
for remote debugging. Does anybody know gdb internals enough to
comment on how this is done or suggest improvements?

Ian

# gdb -k kernel.debug /dev/mem
...
This GDB was configured as "i386-undermydesk-freebsd"...
panic messages:
---
---

warning: skipping first file (kernel)

Reading symbols from /boot/kernel/ufs.ko...done.
Loaded symbols for /boot/kernel/ufs.ko
Reading symbols from /boot/kernel/md.ko...done.
Loaded symbols for /boot/kernel/md.ko
Reading symbols from /boot/kernel/vinum.ko...done.
Loaded symbols for /boot/kernel/vinum.ko
#0  mi_switch () at ../../../kern/kern_synch.c:849
849 td->td_kse->ke_oncpu = PCPU_GET(cpuid);
(kgdb) info sharedlibrary 
warning: skipping first file (kernel)

>FromTo  Syms Read   Shared Object Library
0xc1098dd0  0xc10bfc10  Yes /boot/kernel/ufs.ko
0xc11e24d0  0xc11e4270  Yes /boot/kernel/md.ko
0xc10cf940  0xc10ddc30  Yes /boot/kernel/vinum.ko
(kgdb) proc 316
(kgdb) bt
#0  mi_switch () at ../../../kern/kern_synch.c:849
#1  0xc01b7d14 in msleep (ident=0xc1292e00, mtx=0x0, priority=76, wmesg=0x0, 
timo=0) at ../../../kern/kern_synch.c:559
#2  0xc11e3052 in md_kthread (arg=0xc1292e00) at /usr/src/sys/dev/md/md.c:578
#3  0xc019d2e5 in fork_exit (callout=0xc11e2fd0 , arg=0x0, 
frame=0x0) at ../../../kern/kern_fork.c:853
(kgdb)


Index: Makefile
===
RCS file: /dump/FreeBSD-CVS/src/gnu/usr.bin/binutils/gdb/Makefile,v
retrieving revision 1.61
diff -u -r1.61 Makefile
--- Makefile29 Jun 2002 03:16:10 -  1.61
+++ Makefile7 Oct 2002 10:31:41 -
@@ -37,7 +37,7 @@
ui-file.c ui-out.c wrapper.c cli-out.c \
cli-cmds.c cli-cmds.h cli-decode.c cli-decode.h cli-script.c\
cli-script.h cli-setshow.c cli-setshow.h cli-utils.c cli-utils.h
-XSRCS+=freebsd-uthread.c kvm-fbsd.c
+XSRCS+=freebsd-uthread.c kvm-fbsd.c solib-fbsd-kld.c
 SRCS=  init.c ${XSRCS} nm.h tm.h xm.h gdbversion.c xregex.h
 
 .if exists(${.CURDIR}/Makefile.${TARGET_ARCH})
Index: fbsd-kgdb.h
===
RCS file: /dump/FreeBSD-CVS/src/gnu/usr.bin/binutils/gdb/fbsd-kgdb.h,v
retrieving revision 1.3
diff -u -r1.3 fbsd-kgdb.h
--- fbsd-kgdb.h 18 Sep 2002 16:20:49 -  1.3
+++ fbsd-kgdb.h 6 Oct 2002 23:32:14 -
@@ -7,6 +7,7 @@
 
 extern int kernel_debugging;
 extern int kernel_writablecore;
+extern struct target_so_ops kgdb_so_ops;
 
 #define ADDITIONAL_OPTIONS \
{"kernel", no_argument, &kernel_debugging, 1}, \
Index: kvm-fbsd.c
===
RCS file: /dump/FreeBSD-CVS/src/gnu/usr.bin/binutils/gdb/kvm-fbsd.c,v
retrieving revision 1.42
diff -u -r1.42 kvm-fbsd.c
--- kvm-fbsd.c  18 Sep 2002 16:19:05 -  1.42
+++ kvm-fbsd.c  6 Oct 2002 23:41:56 -
@@ -56,6 +56,7 @@
 #include "bfd.h"
 #include "target.h"
 #include "gdbcore.h"
+#include "solist.h"
 
 static void
 kcore_files_info (struct target_ops *);
@@ -72,6 +73,10 @@
 static int
 xfer_umem (CORE_ADDR, char *, int, int);
 
+#ifdef SOLIB_ADD
+static int kcore_solib_add_stub (PTR);
+#endif
+
 static char*core_file;
 static kvm_t   *core_kd;
 static struct pcb  cur_pcb;
@@ -209,6 +214,12 @@
 
   inferior_ptid = null_ptid;   /* Avoid confusion from thread stuff.  */
 
+  /* Clear out solib state while the bfd is still open. See
+comments in clear_solib in solib.c. */
+#ifdef CLEAR_SOLIB
+  CLEAR_SOLIB ();
+#endif
+
   if (core_kd)
 {
   kvm_close (core_kd);
@@ -305,7 +316,16 @@
   printf ("---\n");
 }
 
-  if (!ontop)
+  if (ontop)
+{
+  /* Add symbols and section mappings for any kernel modules.  */
+#ifdef SOLIB_ADD
+  current_target_so_ops = &kgdb_so_ops;
+  catch_errors (kcore_solib_add_stub, &from_tty, (char *) 0,
+   RETURN_MASK_ALL);
+#endif
+}
+  else 
 {
   warning ("you won't be able to access this core file until you terminate\n"
"your %s; do ``info files''", target_longname);
@@ -651,6 +671,15 @@
   if (set_context ((CORE_ADDR) val))
 error ("invalid proc address");
 }
+
+#ifdef SOLIB_ADD
+static int
+kcore_solib_add_stub (PTR from_ttyp)
+{
+  SO

Re: gdb support for kernel modules

2002-10-08 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Andrew Gallatin 
writes:
>gdbmods does an ugly thing which is incredibly useful.  It assumes
>that the modules you want to debug are sitting in your kernel build
>pool.  So what it does is extract the build directory from the kernel
>(using strings), and runs a find rooted there for the module in
>question.  But its a shell script, so it can get away with stuff like
>that ;)

Yes, I intend to attempt the same thing by extracting the path from
version[] and using similar logic. It can probably use a list of
likely locations and pick the first one where the module actually
exists. GDB already has the `solib-absolute-prefix' and `solib-search-path'
variables, but they are of limited use for kernel modules as the
paths and module names you want for debugging are usually different
to those that were actually loaded.

>Perhaps we could embed the build directory somewhere the elf headers
>of each kernel module (including the kernel) so that kgdb could find
>the corresponding build file with symbols.  Then your (very cool)
>solib-fbsd-kld.c could easily find the kernel and modules which match
>the kernel you're debugging..

True, even having the path as a variable inside the module should
be sufficient I think. The other clever suggestion that was made
to me was to maintain the standard r_debug* symbols in the kernel
so that a virtually unmodified gdb could extract information about
the loaded modules.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: 4.7 RELEASE crashing when transferring large files over the network

2002-11-06 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Al-Afu writes:
>Yes. I am using the fxp driver. Any other possiblities? Or should I take
>it easy (and stick to 4.6.2-RELEASE) until such time a fix for the fxp
>driver on 4.7-RELEASE is done?

I've checked into -stable the fxp driver change that fixes some
random crashes. It might not be the cause of the crashes you have
seen, but it would be worth trying anyway. Either cvsup to -stable,
or just grab revision 1.110.2.26 of sys/dev/fxp/if_fxp.c from cvsweb
and use it instead of the 4.7-RELEASE version of that file.

Note that the above revision will not fix the problem if you have
"options DEVICE_POLLING" in your kernel config file. A fix for that
case should appear in the next week or two though.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: strange netstat output inside 4.x jails...

2002-12-05 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Josh Brooks
 writes:
>
>I run netstat -i fxp0 while _innside_ a jail:

>and then, I transfer a large file from the jail to some external host.
 
>The file I transferred out was 4.3 megabytes.  Opkts only increased by
>1733 ... which means 2481 bytes per packet ... but ifconfig tells us:

How long did you wait after the transfer completed before checking
netstat? I think the packet counts for fxp interfaces are only
updated about once a second, as the driver periodically polls them
from the hardware (the byte counts are updated immediately though).

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: panic: icmp_error: bad length

2002-12-11 Thread Ian Dowse
In message <[EMAIL PROTECTED]>
, Patrick Soltani writes:
>In the last couple of months, upgraded to 4.6 and 4.7 using RELENG_4 =
>with again no errors, however, now under a light smurf attack, I get:
>
>panic: icmp_error: bad length

>Hardware: Dell PowerEdge 350, 2 built-in Intel nic cards, 256 meg of ram =
>and only doing ipfw.=20
>The kernel is built with options BRIDGE.  Don't know what other info you =
>might be interested.
>
>Deeply appreciate any help or info.=20

Could you try to get a stack trace from the panic? There are
instructions on how to set this up in the Kernel Debugging chapter
of the Developers Handbook at:

   http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html

Even just a list of the function names from DDB would be a good
start, but if possible try to compile a debug kernel, get a full
crash dump and provide the gdb stack trace.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: panic: icmp_error: bad length

2002-12-11 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Alexander Langer
 writes:
>Yeah, same situation here.  4.6 used to work w/o problem, 4.7 doesn't.

Great, thanks for the debugging info. The bug seems to be that
icmp_error() requires that the IP header fields are in host order,
but when it is called on a briged packet by the IPFW code, this is
not the case. Something like the patch below (untested) should fix
the IPFW1 case. A similar change is needed for IPFW2.

Luigi: does this look reasonable? I'm not familiour enough with the
IPFW code to know if it is OK to modify the mbuf like this. If not
then it needs to be copied first like ip_forward() does, making
sure that the IP header does not end up in a shared cluster.

Ian

Index: ip_fw.c
===
RCS file: /home/iedowse/CVS/src/sys/netinet/ip_fw.c,v
retrieving revision 1.131.2.38
diff -u -r1.131.2.38 ip_fw.c
--- ip_fw.c 21 Nov 2002 01:27:30 -  1.131.2.38
+++ ip_fw.c 12 Dec 2002 00:43:22 -
@@ -1573,6 +1573,11 @@
break;
  }
default:/* Send an ICMP unreachable using code */
+   /* Must convert to host order for icmp_error(). */
+   if (BRIDGED) {
+   NTOHS(ip->ip_len);
+   NTOHS(ip->ip_off);
+   }
icmp_error(*m, ICMP_UNREACH,
f->fw_reject_code, 0L, 0);
*m = NULL;


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: panic: icmp_error: bad length

2002-12-11 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Luigi Rizzo writes:
>the diagnosis looks reasonable, though i do not remember changing
>anything related to this between 4.6 and 4.7 so i wonder why the
>error did not appear in earlier versions of the code.

Yes strange - actually, it looks like the "THERE IS NO FUNCTIONAL
OR EXTERNAL API CHANGE IN THIS COMMIT" commit may be to blame :-)
Some fragments below.

Ian

bridge.c 1.16.2.2:
+#ifdef PFIL_HOOKS
...
-* before calling the firewall, swap fields the same as IP does.
-* here we assume the pkt is an IP one and the header is contiguous
...
-   ip = mtod(m0, struct ip *);
-   NTOHS(ip->ip_len);
-   NTOHS(ip->ip_off);

ip_fw.c 1.131.2.34:
-   if (0 && BRIDGED) { /* not yet... */
-   offset = (ntohs(ip->ip_off) & IP_OFFMASK);
+   if (BRIDGED) { /* bridged packets are as on the wire */
+   ip_off = ntohs(ip->ip_off);
ip_len = ntohs(ip->ip_len);
} else {


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: panic: icmp_error: bad length

2002-12-12 Thread Ian Dowse
In message <[EMAIL PROTECTED]>, Rober
t Watson writes:
>
>BTW, if this bug exists in 5.0 for the same reasons (or even different
>ones), we should try to generate a fix ASAP and get it committed.

I'll check later today if 5.0 is affected. It is probably easy to
trigger by arranging for a bridged packet with ip->ip_len=0x100 to
generate an ICMP reply from a firewall rule.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: a BSD identd

1999-07-13 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, "Bria
n F. Feldman" writes:
>On 13 Jul 1999, Ville-Pertti Keinonen wrote:
>
>> 
>> [EMAIL PROTECTED] (Brian F. Feldman) writes:
>> 
>> > It's "out with the bad, in with the good." Pidentd code is pretty terrible
>.
>> > The only security concerns with my code were wrt FAKEID, and those were
>> > mostly fixed (mostly meaning that a symlink _may_ be opened, but it won't
>> > be read.) If anyone wants to audit my code for security, I invite them to.
>> 
>> Did you mean to avoid reading through symlinks using the open + fstat
>> method mentioned earlier in the thread?
>
>No, I meant to avoid opening a file the user couldn't, or reading from a dev.

Why not actually store the fake ID in a symbolic link? That way you just
do a readlink(), which would be safer, neater and faster than reading a
file. A user can set up a fake ID with something like:

ln -s "Warm-Fuzzy" .fakeid

Ian


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: NFS problems due to getcwd/realpath

1999-07-15 Thread Ian Dowse

In message <[EMAIL PROTECTED]
e>, Jan Conrad writes:
>after wondering for two years why FreeBSD (2.2.x ... 3.2) might lock up
>when an NFS server is down, I think I have found one reason for that (see
>kern/12609 - I now know it doesn't belong to kern - sorry).
>
>It is the implementation of getcwd (src/lib/libc/gen/getcwd.c). When
>examining the parent dir of a mounted filesystem, getcwd lstats every
>directory entry prior to the mountpoint to find out the name of the
>mountpoint (but it would only need the inodes's device to do a rough 
>check).

This should no longer be an issue with FreeBSD 3.x, as the system normally
uses the new _getcwd syscall. The old code is still in getcwd.c, but is
only used if the syscall isn't present (e.g. if running a 3.x executable
on a 2.2 system).

We use the following patch on all our 2.2-stable machines, which works
around the problem. This was submitted as PR bin/6658, but it wasn't
committed, as a backport of 3.x's _getcwd (which never occurred) was
considered to be a more appropriate change.

Ian

--- getcwd.c.orig   Tue Jun 30 15:38:44 1998
+++ getcwd.cTue Jun 30 15:39:08 1998
@@ -36,6 +36,7 @@
 #endif /* LIBC_SCCS and not lint */
 
 #include 
+#include 
 #include 
 
 #include 
@@ -169,7 +170,28 @@
if (dp->d_fileno == ino)
break;
}
-   } else
+   } else {
+   struct statfs sfs;
+   char *dirname;
+
+   /*
+* Try to get the directory name by using statfs on
+* the mount point. 
+*/
+   if (!statfs(up[3] ? up + 3 : ".", &sfs) &&
+   (dirname = rindex(sfs.f_mntonname, '/'))) 
+   while((dp = readdir(dir))) {
+   if (ISDOT(dp))
+   continue;
+   bcopy(dp->d_name, bup, dp->d_namlen+1);
+   if (!strcmp(dirname + 1, dp->d_name) &&
+   !lstat(up, &s) &&
+   s.st_dev == dev &&
+   s.st_ino == ino)
+   goto found;
+   }
+   rewinddir(dir);
+
for (;;) {
if (!(dp = readdir(dir)))
goto notfound;
@@ -187,7 +209,9 @@
if (s.st_dev == dev && s.st_ino == ino)
break;
}
+   }
 
+found:
/*
 * Check for length of the current name, preceding slash,
 * leading slash.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Sun4c as Xterminal - Problems

1999-12-18 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, [EMAIL PROTECTED] writes:
>I'm trying to use a Sun ELC (sun4c) as an Xterminal on my FreeBSD
>system using Xkernel 2.0.  I've used the old howto's from 1996
>(Philippe Regnauld) as well as NetBSD diskless howto's to set this up.

>So, does anyone have a fix for this?  Back in '96-97, Luigi Rizzo and
>Mike Smith (among others) seemed to be doing this, so I'm hoping someone
>still does.  

I think sometime around 3.0, the networking code in FreeBSD stopped
responding to IP broadcasts where the 'zero' subnet broadcast address,
which in your case is 209.9.69.0.

We currently work around this on some 3.x machines by adding an alias
address (which can be anything, even not in the same subnet) that has
a broadcast address of our subnet zero address. Try something like:

ifconfig fxp0 inet 10.0.0.1 netmask 0x broadcast 209.9.69.0 alias

Maybe the old behaviour of responding to the subnet zero address should
be available via a sysctl?

Ian


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Sun4c as Xterminal - Problems

1999-12-18 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Dan Busa
row writes:

>Earlier than that.  2.2.5?  It prevents the machine from being used
>as part of a smurf amplifier.  If you want to change the behaviour
>see
>
>icmp_bmcastecho="NO"# respond to broadcast ping packets

This is different; the change I was referring to stops FreeBSD from
recognising old-style IP broadcasts as broadcasts. If you have a network
172.16.0.0/16, then 172.16.255.255 is accepted as a broadcast address,
but 172.16.0.0 is not. Diskless Sun machines attempt to use the latter,
so the broadcasts get ignored.

The change is older than I thought though. The code was #ifdef'd out
back in Dec 1995 in v1.33 of sys/netinet/ip_input.c, and was removed
completely in v1.48 (Oct 1996).

Ian


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Why was rsh removed from the fixit floppy?

2000-01-20 Thread Ian Dowse

In message <[EMAIL PROTECTED]
de>, Jan Conrad writes:

>When I cloned a new machine, I usually booted with the floppies, set up
>DOS partitions and disk label and then pulled everyting over by tar and
>rsh, thereby overwriting fstab etc. with prepared files. Worked pretty
>fast...
>
>What would you suggest how to do it?

Unless this has changed recently, the "Emergency Holographic Shell" option
provides ifconfig and mount_nfs. That should allow you to get all the
commands that you need from an NFS server, without even having to wait
for the fixit floppy to load :)

It's a while since I used this, but I remember doing something like:

set -o emacs
ifconfig fxp0 x.x.x.x netmask x.x.x.x
mount_nfs x.x.x.x:/scratch /mnt

/mnt/bin/ln -s /mnt/usr /usr
/mnt/bin/mv /bin /bin.old
/mnt/bin/mv /sbin /sbin.old
/mnt/bin/ln -s /mnt/bin /bin
/mnt/bin/ln -s /mnt/sbin /sbin

where /scratch on the server can contains a minimal /bin, /sbin and
/usr etc. The last few commands could obviously be put in a script on
the server. 

Ian


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: empty lists in for

2000-03-07 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Chet Ramey writes:

>>  for f in $$empty_list ${SUBDIRS}; do ...

>Not bad, but will break if the shell is run with the `-u' option on
>for some reason.

Ok, how about:

for f in $$IFS ${SUBDIRS}; do ...

Ian


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: empty lists in for

2000-03-06 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Warner Losh writes:

>: to
>: 
>:  sh_subdirs=${SUBDIRS}; for f in $$sh_subdirs ; do ...
>

>there's lots of other workarounds, from seeing if SUBDIRS is defined,
>to using make's .foreach.

Another option is:

for f in $$empty_list ${SUBDIRS}; do ...

where 'empty_list' is any undefined sh variable.

Ian


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: PR #10971, not dead yet.

2000-05-31 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, "David E. Cross" writes:

>though.  Especially confusing is the following sequence of events:
>
> 41096 ypserv   CALL  select(0x10,0x8051040,0,0,0xbfbff518)
> 41096 ypserv   PSIG  SIGCHLD caught handler=0x804c75c mask=0x0 code=0x0
...
> 41096 ypserv   RET   sigreturn JUSTRETURN
> 41096 ypserv   CALL  gettimeofday(0xbfbff510,0)
> 41096 ypserv   RET   gettimeofday 0
> 41096 ypserv   CALL  read(0x1c,0x80f3fa0,0xfa0)
> 41096 ypserv   GIO   fd 28 read 4000 bytes
>
>Note that the select returned with -1, with errno set to 4, and it
>did not re-enter the select loop, but just started to read data.  Also note

A quick glance at the RPC library suggests a possible reason for
this sequence. It appears there is a bug in svc_{unix,tcp}.c's
handling of EINTR returns from select() - the code seems to assume
that a 'continue' inside a do-while loop skips the while condition.
Try the patch below (note that I don't use ypserv, I haven't checked
if ypserv uses this code etc etc, so this may have nothing to do
with your problem).

Ian


Index: svc_tcp.c
===
RCS file: /home/iedowse/CVS/src/lib/libc/rpc/svc_tcp.c,v
retrieving revision 1.18
diff -u -r1.18 svc_tcp.c
--- svc_tcp.c   2000/01/27 23:06:41 1.18
+++ svc_tcp.c   2000/06/01 00:21:26
@@ -360,6 +360,7 @@
if (tmp1.tv_sec < 0 || !timerisset(&tmp1))
goto fatal_err;
delta = tmp1;
+   FD_CLR(sock, fds);
continue;
case 0:
goto fatal_err;
Index: svc_unix.c
===
RCS file: /home/iedowse/CVS/src/lib/libc/rpc/svc_unix.c,v
retrieving revision 1.7
diff -u -r1.7 svc_unix.c
--- svc_unix.c  2000/01/27 23:06:42 1.7
+++ svc_unix.c  2000/06/01 00:23:25
@@ -402,6 +402,7 @@
if (tmp1.tv_sec < 0 || !timerisset(&tmp1))
goto fatal_err;
delta = tmp1;
+   FD_CLR(sock, fds);
continue;
case 0:
goto fatal_err;


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Possible problem in find_symdef()

2000-06-03 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wri
tes:
>[EMAIL PROTECTED]

>it hard to compile it under FreeBSD (however I can 
>compile it under Linux).I get "Buss error" and coredump

It's a simple programming error - you're not initialising the pointer
'q' in main(), so your code is overwriting memory at whatever junk
addresss ends up in q when main() is invoked. Add a

q = malloc(sizeof(*q));

and it works. The compiler will spot this problem for you if you include
the options '-Wall -O':

> gcc -Wall -O -o q-pr q-pr.c
q-pr.c: In function `main':
q-pr.c:7: warning: `q' might be used uninitialized in this function

Ian


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message