Anything special with kmem_map and mb_map?
I have been wondering this for some time. There are many kernel submaps: exec_map, clean_map, etc. But if you look the code in vm_map_find(), we have to call splvm() for kmem_map and its submap mb_map, but not for other kernel submaps. So is there anything special with these two kernel submaps? Thanks for any help. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
understanding code related to forced COW for debugger
I have tried to understand the following code in vm_map_lookup() without much success: if (fault_type & VM_PROT_OVERRIDE_WRITE) prot = entry->max_protection; else prot = entry->protection; if (entry->wired_count && (fault_type & VM_PROT_WRITE) && (entry->eflags & MAP_ENTRY_COW) && (fault_typea & VM_PROT_OVERRIDE_WRITE) == 0) { RETURN(KERN_PROTECTION_FAILURE); } At first, it seems to me that if you want to write a COW page, you must have OVERRIDE_WRITE set. But later I find that when wired_count is non zero, we are actually simulating a page fault, not a real one. Anyway, I do not know how the above code (1) prevents a debugger from writing a binary code, (2) forces a COW when a debugger write other data. I also have some questions on wiring a page: (1) According to the man pages of mlock(2), a wired page can still cause protection-violation faults. But in the same vm_map_lookup(), we have the following code: if (*wired) prot = fault_type = entry->protection; and the comment says "get it for all possible accesses". As I undersand it, we wire a page by simulating a page fault (no matter whether it is kernel or user who is wiring a page). (2) Can the kernel wire a page of a user process without that user's request (by calling mlock)? Any help is appreciated. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: understanding code related to forced COW for debugger
On Wed, 21 Jul 1999, Matthew Dillon wrote: > > The VM_PROT_OVERRIDE_WRITE flag is only used for user-wired pages, so > it does not effect 'normal' page handling. Look carefully at the > vm_fault() code (vm/vm_fault.c line 212), that lookup only occurs > with VM_PROT_OVERRIDE_WRITE set if the normal lookup fails and the > user has wired the page. > > So if a normal lookup fails and this is a user-wired page, we try > the lookup again with VM_PROT_OVERRIDE_WRITE, presumably to handle > a faked copy-on-write fault for the debugger. This results in the > following: > > First, we temporarily increase the protections to make the page *appear* > writeable. Note: only 'appear' writeable, not actually be writeable. > > if (fault_type & VM_PROT_OVERRIDE_WRITE) > prot = entry->max_protection; > else > prot = entry->protection; To allow a debugger to write TEXT area of a program, the max_protection field must be set to include VM_PROT_WRITE by the loader. Am I right? > *wired = (entry->wired_count != 0); > if (*wired) > prot = fault_type = entry->protection; > > I'm pretty sure this piece is simply reverting the mess that the > copy-on-write stuff does for the debugger. entry->protection is what > we normally want to use. Since mlock(2) is used by user, these make sense to me. Both vm_fault_wire() and vm_fault_user_wire() have non-zero wired_count of the related map entry before calling vm_fault(). This is done by their caller vm_map_pageable() and vm_map_user_pageable(). Since you are talking about user wiring case, so for the kernel wiring case, the above code should prevent any further fault on the page after this simulated one. Therefore, a kernel-wired page will never cause protection-violation faults, while a user-wired page can, as said on the man pages of mlock(2). Since mlock(2) is used by user, these make sense to me. Thanks for your response. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Questions on new-bus source code
In FreeBSD new-bus architecture, all devices are linked into a device tree. The root of the tree is root_bus, it has a child called nexus0 added during the device configuration phase. I have two questions about this new-bus code: (1) What is the usage of this "nexus0" device? Its parent (root_bus) does not declare the probe method, so probing nexus0 can only return ENXIO for us (from error_method()). (2) I guess that the probe process of all devices on the tree is triggered by root_bus_configure() in subr_bus.c. It is done from top to bottom, i.e. the probe process should be propagated down the device tree from root_bus. Am I right? How does this tree structure achieve the dynamic feature of device configuring (adding/removing devices on the fly)? Having a pig picture often helps to understand the details more readily. Any help is appreciated. ------ Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Configuration mechanism of PCI bus
Even with "PCI System Architecture, 4th edition" at hand, I still have some problems understanding the code in isa/pcibus.c. Please point out any misunderstanding I may have in the following: (1) At first, you can not modify the address port at 0xcf8 without a FULL 32-bit write. The routine pci_cfgopen() seems to use this fact. (2) The constant CONF1_ENABLE_MSK includes 4 higher bus number bits, only 4 bits can be used as bus number, so we can have at most 16 PCI buses. (3) The variable "mode1res" seems to refer to any residual left by BIOS in the address port. If it is non-zero, we will try to find a device using configuration mechanism 1. (3) The magic constant 0xf870ff excludes many devices. How it is chosen? I guess those excluded devices are not important or supported by FreeBSD. It seems to me that if pci_cfgcheck() finds at least one device, then the configuration mechanism is regarded as correctly detected. Any help is appreciated. ---------- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Create a dump image of kernel
Can anyone tell me how to modify the config file to build a kernel that creates dump image whenever it panics. Currently I have to use dumpon command after system bootup. But this command does not work when the panic happens during the bootup time, i.e., when you have no chance to issue the dumpon command. Thanks. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: Create a dump image of kernel
On Fri, 13 Aug 1999, Andrzej Bialecki wrote: > On Fri, 13 Aug 1999, Zhihui Zhang wrote: > > > > > Can anyone tell me how to modify the config file to build a kernel that > > creates dump image whenever it panics. Currently I have to use dumpon > > command after system bootup. But this command does not work when the > > panic happens during the bootup time, i.e., when you have no chance to > > issue the dumpon command. Thanks. > > This is a common problem recently, it seems.. See my recent postings to > this group (or was it -current?). > > Andrzej Bialecki It is in -current list. Subject is: is dumpon/savecore broken?. I read your postings there. It seems we can use remote GDB to debug a kernel that panics even before it probes the devices. I hope it is easy to learn how to use it from the handbook. Thanks. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Need help with kernel trace
I think it helps to understand a routine in the kernel code if I know how the routine is called and what parameters are being passed to it. To get such information, I decide to simulate a panic whenever that routine is called. For example, I want know how link() in vfs_syscalls.c is called and what parameters are being passed to it. I add a sysctl variable named "debug.link_panic" and at the very beginning of link(), I add the following statement: if (link_panic) panic("link() is called"); The system panics whenever I set debug.link_panic to 1 and issue a ln command at the prompt as expected. Now the problem is how to use the coredump to get the information I am interested. The following script records the process I tried: now5# cd /usr/crash now5# gdb -k -s /usr/crash/kernel.gdb kernel.4 vmcore.4 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... (no debugging symbols found)... IdlePTD 3588096 kernel symbol `gd_curpcb' not found. (kgdb) where No stack. I expected that I could have a stack to trace down how link() is called step by step. But it seems to me that I can not do so. The kernel is configured with "config -g" and "make installed" after doing "strip -g kernel". The file kernel.gdb is copied from the directory /usr/src/sys/compile/DDB to /usr/crash before being stripped. The /var/crash is too small, therefore I modified the file /etc/rc so that savecore will save core dumps under /usr/crash. The system is running FreeBSD 3.2 - Release. Any help is appreciated. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
kernel symbol `gd_curpcb' not found
I have tried to debug a kernel by simulating a panic without success. I have read the handbook and searched the mailinglist. I even tried not to strip the debug kernel at all. Still I get the above message and I do not know how to go on. The following are the commands that I used: now5# gdb -k GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. .. This GDB was configured as "i386-unknown-freebsd". (kgdb) symbol-file /kernel Reading symbols from /kernel...done. (kgdb) exec-file kernel.6 (kgdb) core-file vmcore.6 IdlePTD 3600384 kernel symbol `gd_curpcb' not found. (kgdb) where No stack. Thanks for any help. ------ Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Kernel debugging questions
I am using FreeBSD 4.0 and have two questions on kernel debugging: (1) Can I specify /usr/src/sys/compile/MYKERN/kernel.debug as the kernel to boot from manually without copying that file under /? It seems I can not do so. I guess the reason is that the /usr is not mounted at that time. (2) After bootup, I try the following to debug the live system (after reading some pages of the book "Panic! Unix system crash dump analysis"): now4# gdb -k /kernel.debug /dev/mem (kgdb) run Starting program: /kernel.debug Program terminated with signal SIGABRT, Aborted. The program no longer exists. You can't do that without a process to debug. Is there something wrong? I did the same thing with the postmortem coredump files and got similar messages. Maybe I am using gdb in a wrong way. Any help is appreciated. ------ Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: Kernel debugging questions
On Fri, 20 Aug 1999, Greg Lehey wrote: > You can't control the execution of the kernel, you can just look at > the way things are. With the core dump, you at least have the > advantage that things won't change while you look at them; you can't > even do that with /dev/mem. The other alternative is remote serial > debugging, where you *can* influence the execution of the kernel, for > example by setting breakpoints. But remember that the kernel is > already running when you attach to it, so you don't say 'run', you say > 'c[ontinue]'. Thanks for your response. I can not think of those points myself. However, on page 7 of the book "Panic! Unix system crash dump analysis", it says that a debugger named kadb in SunOS can load the real kernel during boot and treat the latter like a great, big, user program, stepping through its execution, examining and modifying values on the fly. It seems to me that FreeBSD does not have such a debugger. Maybe ddb can do so, but it works with assembly. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Serial cable
Hi, Rich: Can you find a serial cable for me? I need to connect two PCs together via RS232 ports. Thanks. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Questions for vnconfig
I have successfully used vnconfig to add swap file and mount disk image files. However, I am still not sure about the following two things: (1) What does the count in "pseudo-device vn count" stand for? My guess is that if it is 2, then we can use /dev/vn0x and /dev/vn1x. If it is 1, then we can only use /dev/vn0x. The x stands for one of those eight partitions [a-h] in one slice. (2) For /dev/vn0[a-h], which one from a-h should I use for which purpose? Any help is appreciated. ------ Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
What does unp stand for?
In file uipc_usrreq.c, there are many routines beginning with unp_. For example, unp_connect(), unp_bind(), etc. What does unp stand for? Thanks. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
FreeBSD FIFO implementation
While looking at the FIFO implementation, I understand that a FIFO is implemented as a socket. But I am not sure where the data in a FIFO is stored (mbuf or filesystem buf structure?) and how it manages the red/write pointers. Can anyone give me a general picture of this? Any help is appreciated. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Help with remote debugging (gdb -k)
After reading the handbook and some postings in the mailing list archive. I still can not make remote debugging work. I basically did the following on FreeBSD-current 4.0 (A is debugging machine, B is the target): (1) Build a debug kernel (options DDB and BREAK_TO_DEBUGGER) on box A. The sio flag I used is 0x90 (I also tried 0x80). Ftp the file /kernel to box B and renamed as /kernel.A (2) Boot the kernel /kernel.A on box B with -d option: >>FreeBSD/i386 boot Default: 0:wd(0,a)/boot/loader boot: /kernel.A -d Debugger("Boot flags requested debugger") Stopped at 0xc0252c27: movl $0, 0xc031ed98 db> gdb Next trap will enter GDB remote protocol mode db> s (3) On machine A, go to the compile directory: #gdb -g kernel.debug (kgdb) target remote /dev/cuaa0 Remote debugging using /dev/cuaa0 Ignoring packet error, continuing... Ignoring packet error, continuing... Couldn't establish connection to remote target Malformed response to offset query, timeout The serial cable is null-modem and has been tested with kermit. It is connected to /dev/ttyd0 (com 1) of machine B and com 2 of machine A. I did not do "strip -x" because I assume this is done by FreeBSD 4.0 automatically and the file debug.kernel is the one with symbols. Any help is appreciated. ------ Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: Help with remote debugging (gdb -k)
On Mon, 30 Aug 1999, Zhihui Zhang wrote: > > After reading the handbook and some postings in the mailing list archive. > I still can not make remote debugging work. I basically did the following > on FreeBSD-current 4.0 (A is debugging machine, B is the target): > > (1) Build a debug kernel (options DDB and BREAK_TO_DEBUGGER) on box A. > The sio flag I used is 0x90 (I also tried 0x80). Ftp the file /kernel to > box B and renamed as /kernel.A > > (2) Boot the kernel /kernel.A on box B with -d option: > > >>FreeBSD/i386 boot > Default: 0:wd(0,a)/boot/loader > boot: /kernel.A -d > Debugger("Boot flags requested debugger") > Stopped at 0xc0252c27: movl $0, 0xc031ed98 > db> gdb > Next trap will enter GDB remote protocol mode > db> s > > (3) On machine A, go to the compile directory: > > #gdb -g kernel.debug > > (kgdb) target remote /dev/cuaa0 > > Remote debugging using /dev/cuaa0 > Ignoring packet error, continuing... > Ignoring packet error, continuing... > Couldn't establish connection to remote target > Malformed response to offset query, timeout > > The serial cable is null-modem and has been tested with kermit. It is > connected to /dev/ttyd0 (com 1) of machine B and com 2 of machine A. > > I did not do "strip -x" because I assume this is done by FreeBSD 4.0 > automatically and the file debug.kernel is the one with symbols. > > Any help is appreciated. > I have just found the reason. I should specify the local serial port of the debugging machine. So I should use: (kgdb) target remote /dev/cuaa1 <-- do not use /dev/cuaa0 Now everything works fine. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: Help with remote debugging (gdb -k)
> > On Mon, 30 Aug 1999, Zhihui Zhang wrote: > > > (3) On machine A, go to the compile directory: > > > > #gdb -g kernel.debug > > -g? > This is a typo. It should be "gdb -k kernel.debug". I have just posted another message pointing out my mistakes. Thanks for your response. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: Problems with FIFO open in non-blocking mode?
On Mon, 6 Sep 1999, Alex Povolotsky wrote: > Hello! > > The following program > > #include > #include > > main() { > int control; > if ((control = open("STATUS",O_WRONLY|O_NONBLOCK))<0) { > perror("Could not open STATUS "); > exit(1); > } > printf("STATUS ready\n"); > close(control); > return(0); > } > > fails to run (STATUS is pre-created FIFO file) with error "Device not > configured", which seems kinda odd for me. > > However, when FIFO is opened with O_RDWR and O_NONBLOCK, every attempt > to select(2) its handler for writing doesn't wait until someone opens > FIFO for reading, but instead FIFO is ready to write at every select. > > Is it a bug or a feature? > I answered a similar question some time ago. You can search the mailing list archive for this. Basically, you need to read the "Advanced Unix Programming Environment" by Stevens. I can not remember every details right now. The "device not configured" error is expected. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
The usage of MNT_RELOAD
The flag MNT_RELOAD is not documented in mount manpages. From the source code, I find that it is always used along with MNT_UPDATE which can be speficied by user (-u option). Can anyone explain the usage of MNT_RELOAD for me? It seems not to be used normally. Any help is appreciated. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: The usage of MNT_RELOAD
On Wed, 8 Sep 1999, Luoqi Chen wrote: > > The flag MNT_RELOAD is not documented in mount manpages. From the source > > code, I find that it is always used along with MNT_UPDATE which can be > > speficied by user (-u option). Can anyone explain the usage of MNT_RELOAD > > for me? It seems not to be used normally. > > > It is created almost exclusively for fsck (and similar programs) to update > the in core image of the superblock (of / in single user mode) after the > on disk version has been modified. > Does fsck have to run on a MOUNTED filesystem? If so, your answer makes sense to me: if fsck modifies the on-disk copy of the superblock, it does not have to unmount and then remount the filesystem, it only need to reload the superlock for disk. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Using gdb with fork()
I am using gdb 4.18 on FreeBSD-current. The program being debugged consists of two small files: test1.c and test2.c. The main() in test1.c has a call to fork() and for the child process case, it will call a routine, say test(), in test2.c. I use "set follow-fork-mode child", "break fork", "step" command trying to access the source in test2.c without success. The program is compiled with "cc -g test1.c test2.c" and I run gdb with "gdb a.out". If there is no fork(), a call from test1.c to a routine in test2.c will bring up the source of test2.c if I step that routine. Why it does not work with fork()? Am I missing something? Thanks for any help. ------ Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
How to follow child process in gdb
On Wed, 8 Sep 1999, Kip Macy wrote: > You need to detach from your current process and attach to the spawned > process. It might make it easier to attach in a timely fashion if you put > a 3 second sleep in right after the fork. This would all be easiest using > something like DDD where DDD will tell you what other processes are > running with the same name, and allow you to attach to them through the > GUI. In dbx on a Sun workstation, all I need to do to follow a child process after fork() is to use the following command in advance: (dbx)dbxenv follow_fork_mode child Your response suggests that I can not achieve the same result simply by using (I am using gdb 4.18): (gdb)set follow-fork-mode child I have to use attach and dettach to do so. Does that mean I have to display the pid of the new process in order to follow it. And I have to modify the child process so that it can wait until I can attach to it. That will not be as easy. -Zhihui > > > > On Wed, 8 Sep 1999, Zhihui Zhang wrote: > > > > > I am using gdb 4.18 on FreeBSD-current. The program being debugged > > consists of two small files: test1.c and test2.c. The main() in test1.c > > has a call to fork() and for the child process case, it will call a > > routine, say test(), in test2.c. > > > > I use "set follow-fork-mode child", "break fork", "step" command trying to > > access the source in test2.c without success. The program is compiled > > with "cc -g test1.c test2.c" and I run gdb with "gdb a.out". > > > > If there is no fork(), a call from test1.c to a routine in test2.c will > > bring up the source of test2.c if I step that routine. Why it does not > > work with fork()? Am I missing something? > > To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Let a daemon process print a message
Can anyone tell me how to let a daemon process print a message to the console? Adding printf() does not work (I wonder if a daemon process has been cut of relationship with stdout). Thanks for any help. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: Let a daemon process print a message
On Mon, 13 Sep 1999, Brian Mitchell (ISSATL) wrote: > syslog() with the proper facility is probably the best way to do this. > Another possibility is opening /dev/console, but I think that will aquire > a controlling terminal. > > On Mon, 13 Sep 1999, Zhihui Zhang wrote: > > > > > Can anyone tell me how to let a daemon process print a message to the > > console? Adding printf() does not work (I wonder if a daemon process > > has been cut of relationship with stdout). Thanks for any help. > > I have tested syslog(). I find out: (1) The log messages will go into /var/log/messages and appear on the console only after I login in (as root). (2) The LOG_INFO priority does not cause the messages to appear on the console or to be written into file /var/log/messages. Can anyone explain the reason for me? Thanks a lot. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
NFS authentication
I am wondering where the NFS authentication is done in FreeBSD. Is it done by the NFS daemon mountd (or other daemon) or within the kernel? Can anyone give me a pointer? Thanks a lot. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Multiple routes to the same destination
As said by the 4.4 BSD book (page 423), 4.4 BSD does not support multiple routes to the same destination (identical key and mask). Does the radix tree code in FreeBSD - 4.0 has the same limitation? I am wondering if there is already a solution for this? Any help is appreciated. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Metablock caching & negative block #
It seems to me that metablocks such as filesystem superblock and cylinder group control blocks are associated with the device vnode. The indirect blocks are associated with the file using them to find data blocks. These indirect blocks are identified by negative block numbers. This makes the max file size limited by 2^31 * 2^9, because we need one bit in the block number to cope with negative block numbers. The first time I understand this I think it is cool because it allows buffering both kinds of data in the same way and we can differentiate them at the same time. Now my question is why we must associated these (double, triple) indirect blocks with the file using them? If these indirect blocks can be handled like other metablocks (superblocks, cylinder group control blocks), we can save one bit and make the max file size to be 2^32*2^9. By the way, all other metablocks seem to be delay-written. In other words, they are not written synchronously. What happens if the system crashes before their updates go to disk. I read in the mailinglist that FreeBSD metadata I/O are conservative. Can anyone describe this a little bit for me. Any help is appreciated. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
VOP_LEASE(...) or (void)VOP_LEASE(...)?
VOP_LEASE(...) always returns 0 so there is no actual need to check its return value. But still it has a return value. So should we use (void)VOP_LEASE(...) instead of just VOP_LEASE(...)? BTW, I guess that the practice of modifying default_vnodeop_p[VOFFSET(vop_lease)] in nfs_init() is a hack. Why do not we use { &vop_lease_desc, (vop_t *) nqnfs_vop_lease_check }, instead of { &vop_lease_desc, (vop_t *) vop_null }, in nfsv2_vnodeop_entries[] in file nfs_vnops.c? Thanks for any help. ------ Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
cylinder group and special device
Can anyone answer the following two questions for me: (1) Does a cylinder group in FFS have to begin at a cylinder boundary? (2) If we read a block via a special device name (/dev/xxx), will the block be buffered as normal file data and used when we need the block again? Thanks for any help. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
What does VOP_WHITEOUT() do?
Can anyone tell me what does VOP_WHITEOUT() do? I can not find it in the hypertext manual pages. Thanks. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
open a file for read and write
If I want to read and write a file, I can do it in two ways: (1) Open the file as read and write, using one file descriptor. (2) Open the file as read only and open it again as write only, using a total of two file descriptors. Method (2) is more clear in logic and uses a little more resource (file descriptors). Other than these, are there any performance reasons for doing so? Method (2) is used in source code file mkfs.c when we open a special device file to create a file system. Thanks for any help. -Zhihui -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
mmap of a network buffer
I really do not know how to describe the problem. But a friend here asks me how to mmap a network buffer so that there is no need to copy the data from user space to kernel space. We are not sure whether FreeBSD can create a device file (mknod) for a network card, and if so, we can use the mmap() call to do so because mmap() requires a file descriptor. We assume that the file descriptor can be acquired by opening the network device. If this is infeasible, is there another way to accomplish the same goal? Thanks for any enlightment. -Zhihui -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
A bug in namei cache?
Suppose you want to mv a directory file (with subdirectories) to another name (it is like grafting a subtree to another point), the namecache associated with the source directory file will be purged by calling cache_purge() (done in ufs_rename()?). However, the routine cache_purge() does not purge cache entries recursively down the subtree. Will this result in a lot of stale entries in the namecache? FreeBSD 3.1 no longer allows stale entries in the namei cache (FreeBSD 2.2.8 does). Thanks for any help. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: File system gets too fragmented ???
> > It might help somewhat if a file that grows by a fragment can allocate > the free fragment immediately preceeding it instead of being relocated > to a fresh block. I don't know if FFS does this or not. > Really? FFS allocates free fragments with bitmap, so it should be able to find free fragments anywhere. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: A bug in namei cache? (stale entries)
On 27 May 1999, Ville-Pertti Keinonen wrote: > zzh...@cs.binghamton.edu (Zhihui Zhang) writes: > > > Suppose you want to mv a directory file (with subdirectories) to another > > name (it is like grafting a subtree to another point), the namecache > > associated with the source directory file will be purged by calling > > cache_purge() (done in ufs_rename()?). However, the routine cache_purge() > > does not purge cache entries recursively down the subtree. Will this > > result in a lot of stale entries in the namecache? FreeBSD 3.1 no longer > > The name cache only caches component names, not paths, so the entries > are still valid. > Thanks for your reply. I understand now that the namecache only acts on individual component names, not on the entire pathname. The following is based on my understanding: Suppose, you have a directory hierarchy a -> b -> c. In each of a, b, and c, we have the following files: a: ., .., a1, a2, a3, b (a1, a2, a3 are not directory files) b: ., .., b1, b2, b3, c (b1, b2, b3 are not directory files) If I do a "mv a a_new", then cache entries for a, a1, a2, a3, b will be purged from the cache. Although b is purged from the namecache, we can still find it by other means (e.g. ufs_ihashget() called by ffs_vget()). So the entries for b1, b2, b3, c are still useful. So the namei cache will not contain any stale entries. Am I right? -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Algorithm used to delete part of a file
I am wondering what will happen to the underlying data blocks and indirect blocks of a file if I delete a part of the file - how these blocks are re-organized. I have no idea which source code should I look into to understand this. Maybe I should read the source code for vi or another editor. I hope someone can suggest me a better way to understand this or describe briefly the algorithm. Any help is appreciated. -Zhihui -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: Algorithm used to delete part of a file
On Fri, 28 May 1999, Christopher R. Bowman wrote: > It is difficult to understand if you are talking at the file system layer > (because you mention data and indirect blocks) or the application layer (you > mention looking at the vi code). At the file system layer you don't delete > blocks in the middle of a file. You can append to the file thus allocating > new > data blocks (and perhaps indirect blocks if they are needed) that will be > added > to the end of the file. Or you can truncate a file, thus freeing the data > blocks (and perhaps indirect blocks) at the end of the. When you truncate a > file the data blocks are returned to a list of free blocks, and when a block > is > later reused for another purpose it is either written to in it's entirety or > zero filled, and then partially filed with your data (if you don't write the > entire block). In either case blocks are never added or removed except at the > end of the file, thus blocks never have to be "re-organized." They are simply > allocated or freed. If this is the level of your interest then looking at vi > source code won't help you. Thanks for your valuable information. This explains why I have not found any routines in the files under /ufs/ffs and /ufs/ufs that re-organize the on-disk image of a file in that way. If a middle part of a file is deleted, then all the remaining part of the file must be read by an editor (such as vi) and written out to another place before the file length is truncated. This algorithm seems to be not very efficient. But disk is not like memory, where we can simply modify pointers to point to new locations easily, I guess there may be no better way to do this. If you have any ideas about why this is not done by the filesystem itself, please let me know. Thanks for your help. Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: Algorithm used to delete part of a file
On Sat, 29 May 1999, Duncan Barclay wrote: > Primarily the file system is a "block" orientated storage media where a > "block" > is the fragment size or a file system block. Addressing in the filesystem is > done on a block by block basis. As each block is a number of bytes we cannot > use byte addressing to simply move pointers around. > > If you find the papers written by Rob Pike on the editor "Sam" undr Plan-9 he > goes into a lot of detail about algorithms for removing/adding bytes into a > storage area with block addressing. > Thanks. I have found a paper named "the text editor sam" by Rob Pike in 1987 at http://plan9.bell-labs.com/cm/cs/papers.html. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
question about vnode and inode locking
It seems to me that we can lock at the vnode layer AND at the inode layer. Since an inode is always associated with a vnode, and is accessed via its vnode, I do not see the reason why we should lock the inode after having locked the vnode. Can anyone help me with this? Thanks a lot. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Accessing special device files
I write a small program to read/write each FreeBSD partition via special device file names, e.g. /dev/wd0s2e, /dev/rwd0s2e, etc. I have two questions about doing this: (1) If I try to read() on these files, the buffer size must be given in multiples of 512 (sector size). Otherwise, I will get an EINVAL error. Why is this the case? Does the same thing happen to the write() system call? (2) I use lseek() on these device files, it returns the correct offset for me. But actually it does not work. I read in a recent posting saying that you can't expect lseek(fd, 0, SEEK_END) to work unless the file descriptor is associated with a regular file because file size information is not available at that level. Does this apply to all kinds of lseek(), include SEEK_SET and SEEK_CUR? Or maybe the offset must also given in a multiple of 512 for some reason. If I give lseek(fd, 8193, SEEK_SET), it will actually do lseek(fd, 8192, SEEK_SET)? Thanks for any help. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: Accessing special device files
On Tue, 1 Jun 1999, Wes Peters wrote: > > ??? > > dd verifies the behavior you report: > > r...@homer# dd if=/dev/rwd0s2b of=/dev/null bs=1 > dd: /dev/rwd0s2b: Invalid argument > ... > r...@homer# dd if=/dev/rwd0s2b of=/dev/null bs=512 > ^C18805+0 records in > ... > > w...@homer$ ls -l /dev/*wd0s2a > crw-r- 1 root operator3, 0x0003 Apr 1 11:10 /dev/rwd0s2a > brw-r- 1 root operator0, 0x0003 Apr 1 11:10 /dev/wd0s2a > > The rwd device is clearly a character-special device, the wd device a > block special. Character devices can always be read byte-at-a-time, > by definition. When did the semantics of this change? > I have verified the requirement that character device must be read in multiples of 512 from the source code point of view (the disk involved in an IDE drive): When we call read(int d, void *buf, size_t nbytes) system call, the argument nbytes is passed on to the iov_len field of an iov structure (see file sys_generic.c). Later, the routine vn_read() in file vfs_vnops.c is called via the structure fileops, the uio structure is passed along. vn_read() will call spec_read() via VOP_READ() because we are talking about raw device file name. spec_read() will call wdread() via the cdevsw table. wdread() will call physio() where b_bcount of a buffer is set to be iov_len. The routine wdstrategy() invoked by physio() will check if bp->b_bcount % DEV_BSIZE != 0. If it detects an request size that is not a multiple of 512, it will set b_error = EINVAL. This error will be picked up by physio() and returned. Thanks for your help. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
The choice of MAXPHYS
The value of MAXPHYS is chosen to be 64K for the maximum raw I/O transfer size. I am wondering why it is not set larger. The maxcontig value of FFS is default to be 16, which means 16*8192 or 128K bytes (twice as big as 64K) . If we raise the value of MAXPHYS, we can put more data blocks of a big file contiguously on the disk (perhaps even more than 16 blocks to achieve better performance). Am I right? Is there any limit of the value of MAXPHYS? Any help is appreciated. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: allocate file blocks contiguously
> > For more info about maxcontig, you can refer to the well-known > paper of McKusic et al about Fast File System. It is a parameter > that is hardware dependent. You can't get performance just by > increasing its value. Unfortunately, I don't have on-line version > of that paper. > > > --Farshid > > On Wed, 2 Jun 1999, Zhihui Zhang wrote: > > > > > In FFS, there is a parameter called maxcontig (default to 16) that > > determines the number of blocks we can allocate contiguously for a single > > file. What is its optimal value? I mean, if we allocate ALL the data > > blocks of a very big file contiguously, will its I/O performance be > > improved greatly? It seems to me this number may also be limited by > > system buffering capability (MAXPHYS?) and underlying hardware controller. > > Can anyone give me some hints on the choice of the value of maxcontig? > > I read the paper at http://docs.FreeBSD.org/44doc/, which is basically the same as in the 4.4 BSD book (p276). My feeling is that if we allocate ALL the data blocks of a big file contiguously, this will lead to "too much localization" as described in the paper (or the book). However, this may be good for this big file if the system buffering capability and hardware allow it (at the cost of other files?) Regards, Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
help with I/O optimization with object
While studying the file ufs_readwrite.c, I see routines like uiomoveco() that calls vm_uiomove() in vm_map.c. I am almost sure that these are new in FreeBSD 3.x. The comment in ffs_read() says "not a VM based I/O requests" == "not headed for the buffer cache". This does not make sense to me although I understand something about VMIO buffers and non-VMIO buffers. I hope someone can explain the basic ideas of I/O optimization with VM object (relating to the OBJ_OPT flag and the global variable vfs_ioopt) so that I can understand the code easier. Any help is appreciated. ---------- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: help with I/O optimization with object
On Mon, 7 Jun 1999, Zhihui Zhang wrote: > > While studying the file ufs_readwrite.c, I see routines like uiomoveco() > that calls vm_uiomove() in vm_map.c. I am almost sure that these are new > in FreeBSD 3.x. The comment in ffs_read() says "not a VM based I/O > requests" == "not headed for the buffer cache". This does not make sense > to me although I understand something about VMIO buffers and non-VMIO > buffers. I hope someone can explain the basic ideas of I/O optimization > with VM object (relating to the OBJ_OPT flag and the global variable > vfs_ioopt) so that I can understand the code easier. > After searching the mailing list archive for some time and tracing down who calls vm_uiomove(), it seems to me that this is the zero copy read stuff used to read data into the current process' address space. However, I do not know when it can be useful or any more details. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
What is FTW?
In the FAQ of FreeBSD 2.X, 13.12. Alternative layout policies for directories, there is the following sentence: Most filesystems are created from archives that were created by a depth first search (aka ftw). What does ftw stand for (My guess is File Tree Walk)? Can anyone give me examples of programs that create archives from a file tree in a depth first way? Do these programs rebuild the file tree from archive exactly as they were created? Any help is appreciated. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: What is FTW?
On Wed, 9 Jun 1999, Zhihui Zhang wrote: > > In the FAQ of FreeBSD 2.X, 13.12. Alternative layout policies for > directories, there is the following sentence: > > Most filesystems are created from archives that were created by a depth > first search (aka ftw). > > What does ftw stand for (My guess is File Tree Walk)? Can anyone give me > examples of programs that create archives from a file tree in a depth > first way? Do these programs rebuild the file tree from archive exactly as > they were created? > I have just found that ftw does stand for File Tree Walk and there is a C library routine named ftw() (XPG4 standard) in AIX and HP-UX. However, I can not find the same routine in FreeBSD manual pages. Maybe it is not supported by FreeBSD. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
The clean and dirty buffer list of a vnode
What does VOP_FREEBLKS() do?
I find in the routine ffs_blkfree() there is a new statement saying: VOP_FREEBLKS(ip->i_devvp, fsbtodb(fs, bno), size); which calls spec_freeblks() in file spec_vnops.c. The routine spec_freeblks() looks simple. When D_CANFREE is set, it gets an empty buffer and call strategy routine for the buffer. Since B_READ is not set, we must call the strategy routine to write some data. But where is the data for the buffer? Why we call VOP_FREEBLKS() at the time we are going to free the blocks? BTW, this vnode operation is not listed in the man pages. Any help is appreciated. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Jeroen Ruigrok/Asmodai's project
Dear Jeroen Ruigrok/Asmodai: I received your email concerning your documentation project a week ago. I tried to respond a couple of times, but I could not reach your private email address. I have written a much longer email. Anyway, I am afraid that being a one year old newbie I could not help as much as you expect. I appreciate all the help I have received from you and others on this list. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Difference between msync() and fsync()
After we mmap a file, we can write back the dirty pages of the file either by calling msync() or fsync(). After reading the source code, it seems to me that they actually does the same thing. msync() will eventually call VOP_FSYNC() as fsync() does. Since msync() has already call the routine vm_object_page_clean() to write back the dirty pages of the file, VOP_FSYNC() really does not have much left to do except update the inode. So is there any real differnce between msync() and fsync() on mmapped files? Or are they simply provided to do the same thing in an alternate way? Thanks for any help. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Implementation of mmap() in FreeBSD
Re: RE: Implementation of mmap() in FreeBSD
> > Because we can't realign the data in the pages without doing a buffer > copy. To force mmap() to align the data to the start of the page requires > it to allocate memory and copy the in-core disk cache to the new memory. > > This is extremely wasteful of cpu and memory. The current UNIX mmap > implementation is able to simply map the existing in-core disk cache > directly to the process - no buffer copying is required at all, and > it is extremely memory efficient. I guess you are talking about VMIO buffers where the pages are found and registered into the buffer header during allocbuf(). When we do I/O on VMIO buffers using conventional system call method, we specify UIO_NOCOPY to instruct the uiomove() do not perform data copy. > Programmers who use mmap() expect it to be as close to optimal as > possible. I write a program to test the mmap() today. It turns out that a user can modify the part of the mmapped area that is within the system returned area but not part of the user-specified area. As I understand it, there are two access paths to a file: conventional I/O through read/write systems calls and memory-mapped I/O. Both of them converge at the vnode read and write routine (VOP_READ() and VOP_WRITE()). This should give us the opportunity to guard against illegal memory-mapped I/O writes made by the user. Maybe we can add some fields in the vm_object to record the real or user-specifed area which can be passed to the vnode read and write routine. In the vnode I/O routine, we should be able to limit the write to only the orginal part of the area specified by the user. This practice should not incur any performance loss. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: RE: Implementation of mmap() in FreeBSD
On Mon, 28 Jun 1999, Matthew Dillon wrote: > :> it is extremely memory efficient. > : > :I guess you are talking about VMIO buffers where the pages are found and > :registered into the buffer header during allocbuf(). When we do I/O on > :VMIO buffers using conventional system call method, we specify UIO_NOCOPY > :to instruct the uiomove() do not perform data copy. > > UIO_NOCOPY is used to handle a degenerate case in the VFS/BIO vs VM > interaction for I/O, it has nothing to do with the read() or write() > syscall per say, nor is it related to the mmap code. > > :> Programmers who use mmap() expect it to be as close to optimal as > :> possible. > : > :I write a program to test the mmap() today. It turns out that a user can > :modify the part of the mmapped area that is within the system returned > :area but not part of the user-specified area. > : > :As I understand it, there are two access paths to a file: conventional I/O > :through read/write systems calls and memory-mapped I/O. Both of them > :converge at the vnode read and write routine (VOP_READ() and VOP_WRITE()). > :This should give us the opportunity to guard against illegal memory-mapped > :I/O writes made by the user. > > They converge in the VMIO page cache. By converge, I mean VOP_GETPAGES() and VOP_PUTPAGES() will call VOP_READ() and VOP_WRITE() just as read() and write() system call. > > :Maybe we can add some fields in the vm_object to record the real or > :user-specifed area which can be passed to the vnode read and write > :routine. In the vnode I/O routine, we should be able to limit the write to > :only the orginal part of the area specified by the user. This practice > :should not incur any performance loss. > : > :-Zhihui > > mmap bypasses the vnode. What you propose will not work because even if > the VM object is process-specific, the pages underlying the VM object are > not. If several processes are mmap()ing overlapping portions of the file, > they are *sharing* the pages. So even though they are not sharing the > VM object, the VM system will not be able to tell which process modified > the page, and therefore any byte-ranged limits specified in the VM object > will be useless. This is a good point! I have never thought of it before. Thanks. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
A way to crash system (3.1 & 3.2) with floppy
Suppose you have a *write-protected* DOS floppy and you do: # mount -t msdos /dev/fd0 /floppy <-- this is OK # cp somefile /floppy <-- a lot of error messages # umount /floppy <-- crash Now the system tries to sync the dirty buffers and fails. You have to press a key to reboot. Is there anything wrong here or FreeBSD simply does not handle this in a more elegant way? Thanks for any help. ------ Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
reason for slow user-user memory copy
A graduate student here implements a mmap() interface to a TCP/IP network card. He notices that it takes much longer time to copy from mmapp()'ed area to another user area than it takes to copy the same amount of data from kernel space to user space. The students here have no idea why this could be possible. I hope someone on this list can give us a hint. Below is a part of his original email. He uses rdtsc instruction to do the timing. Well I have implemented a memory mapped interface for the user in Linux using the DEC 21140 Tulip ethernet card. Thus the user has access to the buffers, but when I did a memcpy from the RX buffer to the user variable, it took an extraordinary amount of time, approx 70 microsec for 1460 btyes... where as the original scheme takes 25 microsec for the same data when it does a memcpy_to_iovec in tcp_recvmsg(). I am confused by this unexpected timings. More than 80% of the time is spent doing the memcpy. --- Thanks for your help. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: reason for slow user-user memory copy
On Thu, 1 Jul 1999, David Greenman wrote: > >A graduate student here implements a mmap() interface to a TCP/IP network > >card. He notices that it takes much longer time to copy from mmapp()'ed > >area to another user area than it takes to copy the same amount of data > >from kernel space to user space. The students here have no idea why this > >could be possible. I hope someone on this list can give us a hint. Below > >is a part of his original email. He uses rdtsc instruction to do the > >timing. > > > > > >Well I have implemented a memory mapped interface for the user in Linux > >using the DEC 21140 Tulip ethernet card. Thus the user has access to the > >buffers, but when I did a memcpy from the RX buffer to the user variable, > >it took an extraordinary amount of time, approx 70 microsec for 1460 > >btyes... where as the original scheme takes 25 microsec for the same data > >when it does a memcpy_to_iovec in tcp_recvmsg(). > > > >I am confused by this unexpected timings. More than 80% of the time is > >spent doing the memcpy. > >--- > >If the mapping is being done via a device mapping, then the region will > be marked non-cacheable. > > -DG I remember that he said he created a character device /dev/tulip to represent the network card. Actually, his work borrowed a lot from the Cornell U-Net project (now the basis of VIA?). Can we change the corresponding page table (directory) entries to be cacheable as needed? -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Overwrite an executable file that is running
For a big executable file that is being run by the OS, all its contents may not be loaded into the memory. At the same time, the developer gets impatient and wants to create a new version of the same file. He could modify the makefile to output the new version to a different file name, but this is tedious. This new version should not overwrite the older verion of the file being run. My question is how FreeBSD prevents this from happening? Can anyone point out for me where in the source code this is handled? Thanks a lot. -- Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Wrong comment in VM code?
At the beginning of the file vm_object.c, we have the following comment: The only items within the object structure which are modified after time of creation are: reference count locked by object's lock pager routine locked by object's lock But at the end of vnode_pager_setsize(), we modify the size field. So at least three items can be modified after creation. Am I right? Thanks for any help. ------ Zhihui Zhang. Please visit http://www.freebsd.org -- To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Help with PCI code understanding
Can someone outline the initialization process of PCI devices in FreeBSD? I know many of the basic stuff of PCI introduced in the book "PCI System Architecture". I just want to know how each driver is registered into some linker set and its probe routine gets called. In other words, I want to know the major data structures and routines and their relationship. I wonder if there is already a document somewhere. Any help is appreciated. -Zhihui To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Buffer emergence reserve
While looking at the code in vfs_bio.c, I notice the existence of low and high free buffer counters. The comments say they are there to give some special process like buf daemon access to emergence reserve. I just don't get the reason for having this emergence reserve. Do we allocate buffer in an interrupt environment? Do we need extra buffers in order to free buffers? Please shed a light on this for me. Thanks. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Buffer emergence reserve
Thanks! I am wondering whether the free VM page reserve has similar reason to exist, i.e., to clean dirty pages you need more pages. Probably not, that is for interrupt routines that can not block. On Wed, 18 Apr 2001, Alfred Perlstein wrote: > * Zhihui Zhang <[EMAIL PROTECTED]> [010418 09:18] wrote: > > > > While looking at the code in vfs_bio.c, I notice the existence of low and > > high free buffer counters. The comments say they are there to give some > > special process like buf daemon access to emergence reserve. I just > > don't get the reason for having this emergence reserve. Do we allocate > > buffer in an interrupt environment? Do we need extra buffers in order to > > free buffers? Please shed a light on this for me. Thanks. > > It's really a simple issue of: > > "sometimes to clean a buffer we need one or more buffers" > > Think of some random data block at the far end of a large file. > > If the indirect blocks aren't in memory you will need to bring > them in to lookup the location of the buffer you're writing > because buffers use logical offsets rather than physical ones. > > -- > -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] > Represent yourself, show up at BABUG http://www.babug.org/ > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
cv_wait() or sv_wait() in FreeBSD
Do we have conditional/synchronization variable support in FreeBSD? If not, is there any alternative mechanism to use in the kernel? Thanks. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
shared versus exclusive lock
According to my reading of kern_lock.c, it does support shared lock. However, we are still using LK_EXCLUSIVE mode more often than necessary. If I want to look up a directory or to read a buffer, I should be able to use the LK_SHARED lock. Right now, only few places I have found using LK_SHARED, like in vn_read(). Is there any reason behind this? If I want to change this in my code, is there anything I should pay special attention to? Thanks. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Confusion with mknod() and devfs
There is following comment inside ufs_mknod() which says /* * Remove inode, then reload it through VFS_VGET so it is * checked to see if it is an alias of an existing entry in * the inode cache. */ I really can not understand it. For each new disk inode, we call ufs_vinit() from ffs_vget() and ufs_vinit() calls addaliasu() to add the vnode to the alias list. So why reload? The alias vnode is already handled after it calls ufs_makeinode(). Since DEVFS is in use, will it prevent a user from creating alias names to the same device? If so, there is no need to handle alias in the kernel. According to the red daemon book, alias vnodes are used to make cache coherent (vp as a key). But getblk() stuff does not seem to check it. This makes me feel the code is there for historical reasons. Thanks for any clarification. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Confusion with mknod() and devfs
On Fri, 22 Jun 2001, Terry Lambert wrote: > Zhihui Zhang wrote: > > According to the red daemon book, alias vnodes are used to make cache > > coherent (vp as a key). But getblk() stuff does not seem to check it. > > This makes me feel the code is there for historical reasons. > > The "BSD 4.4" book was written about a system without a > unified VM and buffer cache. The aliases it is talking > about are the buffers hung off a file vnode and the > buffers hung off a device vnode, from which that file > was being read. I think you got me wrong. I was talking about a device with more than one names. So we can have more than one vnode for the same device. (If there is more than one name to the same device in the same FS, they can share the vnode, otherwise, they cannot.) Specifically, I fail to understand why we reload the inode in ufs_mknod(): /* * Remove inode, then reload it through VFS_VGET so it is * checked to see if it is an alias of an existing entry in * the inode cache. */ vput(*vpp); (*vpp)->v_type = VNON; /* Save this before vgone() invalidates ip. */ ino = ip->i_number; vgone(*vpp); error = VFS_VGET(ap->a_dvp->v_mount, ino, vpp); I wonder with the use of DEVFS, the special device aliases may no longer exist because they are created by kernel instead of by administrators. -Zhihui > The reason getblk() doesn't check it is that the cache is > maintained as coherent, so there's no need, since the > check is intended to permit explicit coherency operations > to take place, when necessary. There is a lot of "missing" > code you aren't seeing that is referenced by the book. > > It is still possible to create aliases, but they are done > by having multiple vm_object_t's pointing to the same data > blocks as backing objects. This only occurs in the case > of stacking VFS's with a non-trivial relationship (e.g. > where the backing object contents would not be the same > between layers). It can also occur to some small extent > in the NFS client FS case. > > -- Terry > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Confusion with mknod() and devfs
On Sat, 23 Jun 2001, Terry Lambert wrote: > Zhihui Zhang wrote: > > I think you got me wrong. I was talking about a device > > with more than one names. So we can have more than one > > vnode for the same device. (If there is more than one name > > to the same device in the same FS, they can share the vnode, > > otherwise, they cannot.) > > This is not how it works. The specfs/devfs will return > the same vnode. > > A "special device" file type in the traditional sense is > a major/minor/{block|character} tuple. > > The entry in an FS that references this is _not_ where > the vnode comes from, it's a hint to tell the system to > get the vnode from a single place, instead (specfs in a > traditional system, vfs in a less traditional system). > > > > Specifically, I fail to understand why we reload the inode > > in ufs_mknod(): > > Because when you make the node, you may have an exiting > open reference to the same major/minor/{block|character} > tuple, and you don't want to duplicate it in the ihash > cache. > Thanks. But I still don't get it. The ihash is keyed on i_dev (the device where the filesystem is mounted on) and i_number. If I have two names in a filesystem refer to the same device, then their inode number must be different. if two names from different filesystems refer to the same device, then their i_dev is different even if their inode number may happen to be the same. So I do not see how can we avoid duplicate entries in the ihash cache. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
trace a library call
Suppose I write a program that calls sbrk(). How can I trace into the function sbrk()? In this particular case, I want to know whether sbrk() calls the function in file lib/libstand/sbrk.c or sys/sbrk.S. Sometimes it is nice to see what system call is eventually called as well. I know dynamic linking may make this hard. But is there a way to do this? Thanks. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: does data overflow in pipes
I guess the kernel will block the process trying to write more data than that can be accommodated. Or if you are using non-blocking I/O, it will return an error. -Zhihui On Wed, 27 Jun 2001, Manas Bhatt wrote: > hi all, > pipes uses only direct blocks to store data. so > depending on the blocksize , a total data of > 10*blocksize can be written in one go but what happens > if a writer process tries to write more 10*blocksize > of data in one go. Does the kernel overwrites the > data in pipe or not ? if yes, why? if not, then how > does it allow the writer to write more 10*blocksize of > data? > if someone can direct me to implementation > (source files), it would be great. > thanks > --manas > > __ > Do You Yahoo!? > Get personalized email addresses from Yahoo! Mail > http://personal.mail.yahoo.com/ > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: trace a library call
sbrk() is not supported in FreeBSD as a system call (see file vm/vm_mmap.c). However, sbrk(0) can reflect the latest end of the heap. I am interested in how sbrk() interacts with malloc(). I know my question is too specific. Thanks for your answer. I did learn a lesson: mixing abstraction layers is really bad. -Zhihui On Thu, 28 Jun 2001, Terry Lambert wrote: > Zhihui Zhang wrote: > > > > Suppose I write a program that calls sbrk(). How can I trace into the > > function sbrk()? In this particular case, I want to know whether > > sbrk() calls the function in file lib/libstand/sbrk.c or sys/sbrk.S. > > Sometimes it is nice to see what system call is eventually called as well. > > I know dynamic linking may make this hard. But is there a way to do > > this? Thanks. > > sbrk() is a system call, not a library call. It has a > stub that just loads a register with the call ID and > does an INT 0x80. > > You can't "trace into" it, since you are in a user space > program. > > If you want to see how it works, the sources are in /sys; > but all it does is add pages to the end of the address > space, in the heap. > > If you are having problems with it, you are probably using > sbrk() and malloc() in the same program. Don't do that; > malloc() traditionally calls sbrk() to get pages, so you > will have the same effect as trying to use fopen() and > open() in the same program: mainly, that fd manipulation > routines can close/open/etc. fd's out from under file > pointers. In the sbrk() case, there can be attempts to > (re)map pages to regions where they don't really belong. > > -- Terry > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: trace a library call
I am sorry. It turns out when the argument is zero, sbrk() does not enter into the kernel. If it does, it will return not supported. -Zhihui On Thu, 28 Jun 2001, Zhihui Zhang wrote: > > sbrk() is not supported in FreeBSD as a system call (see file > vm/vm_mmap.c). However, sbrk(0) can reflect the latest end of the heap. I > am interested in how sbrk() interacts with malloc(). I know my question is > too specific. Thanks for your answer. I did learn a lesson: mixing > abstraction layers is really bad. > > -Zhihui > > On Thu, 28 Jun 2001, Terry Lambert wrote: > > > Zhihui Zhang wrote: > > > > > > Suppose I write a program that calls sbrk(). How can I trace into the > > > function sbrk()? In this particular case, I want to know whether > > > sbrk() calls the function in file lib/libstand/sbrk.c or sys/sbrk.S. > > > Sometimes it is nice to see what system call is eventually called as well. > > > I know dynamic linking may make this hard. But is there a way to do > > > this? Thanks. > > > > sbrk() is a system call, not a library call. It has a > > stub that just loads a register with the call ID and > > does an INT 0x80. > > > > You can't "trace into" it, since you are in a user space > > program. > > > > If you want to see how it works, the sources are in /sys; > > but all it does is add pages to the end of the address > > space, in the heap. > > > > If you are having problems with it, you are probably using > > sbrk() and malloc() in the same program. Don't do that; > > malloc() traditionally calls sbrk() to get pages, so you > > will have the same effect as trying to use fopen() and > > open() in the same program: mainly, that fd manipulation > > routines can close/open/etc. fd's out from under file > > pointers. In the sbrk() case, there can be attempts to > > (re)map pages to regions where they don't really belong. > > > > -- Terry > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Max DMA size
Can anyone tell me what factors determine the max DMA size (DMA counter on each controller or PCI bus related)? What is the typical max DMA size for a SCSI disk connected to a PCI bus? It seems to be much larger than MAXPHYS (128K). If so, does it mean we are not using full potential of DMA? So what's the problem if we enlarge MAXPHYS? Any help is appreciated. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
KLD, kernel threads, zone allocator
I am writing a KLD that gives me kernel fault each time I run 'ps' command after 'make unload'. The KLD has a system call to create several kernel threads by calling kthread_create(). During unload, I set flags to each threads so that they will call exit1() upon wakeup (sleep on a timeout). Before the last thread calls exit1(), it wakeup the kld unload process so that make 'unload' can finish. Is there anything wrong or better solutions? I also use vm_zone to allocate some data structes within the KLD. When unloading, I can use zfree() to free them except the zone header that I can not free(some_zone, M_ZONE). This is because M_ZONE is defined as *static* in vm_zone.c I wonder if this will cause memory leak after several loading and unloading the KLD. Finally, I want to know how to save the panic screen without hand writing it down. Any info on debugging under db> after fault? Any help is appreciated. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
memory type and its size
Does kernel memory of the same type (e.g., M_TEMP) must be allocated (using malloc()) with the same (range of) size? BTW, how to display mbuf cluster usages info. Thanks. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: memory type and its size
On Thu, 20 Jul 2000, Zhihui Zhang wrote: > > Does kernel memory of the same type (e.g., M_TEMP) must be allocated > (using malloc()) with the same (range of) size? BTW, how to display mbuf > cluster usages info. Thanks. A memory type can have memory blocks with different sizes. Use netstat -m to display mbuf cluster usages. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: KLD, kernel threads, zone allocator
On Mon, 17 Jul 2000, Zhihui Zhang wrote: > > I am writing a KLD that gives me kernel fault each time I run 'ps' command > after 'make unload'. The KLD has a system call to create several kernel > threads by calling kthread_create(). During unload, I set flags to each > threads so that they will call exit1() upon wakeup (sleep on a timeout). > Before the last thread calls exit1(), it wakeup the kld unload process so > that make 'unload' can finish. Is there anything wrong or better > solutions? > > I also use vm_zone to allocate some data structes within the KLD. When > unloading, I can use zfree() to free them except the zone header that I > can not free(some_zone, M_ZONE). This is because M_ZONE is defined as > *static* in vm_zone.c I wonder if this will cause memory leak after > several loading and unloading the KLD. > > Finally, I want to know how to save the panic screen without hand writing > it down. Any info on debugging under db> after fault? > > Any help is appreciated. Thanks to those who have helped me privately. It is not a good idea to use zone allocator with KLD. You must clear everything before unloading the KLD. Any kernel threads can be reparented to initproc to avoid 'ps' panic. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
bridge driver and Yamaha YMF 724
Does 4.1-Release support YAMAHA PCI Audio Controller YMF 724? I have tried the suggestion given by man pcm without success. By the way, what is a card with bridge driver support and a PnP card as mentioned by man pcm? Thanks for your help. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
recompiling boot blocks & serial console
I want to set up a serial console on a freebsd 4.1 box. I follow the instructions at http://www.mostgraveconcern.com/freebsd/. I tried to do the following: # cd /sys/boot/i386/boo2 # make clean # make I got "cannot open ../btx/lib/crt0.o". What happened? Besides, I want to use another freebsd box as console. Can I use kermit as the terminal program? If so, can I configure it as for normal login purpose? Any help is appreciated. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: recompiling boot blocks & serial console
On Thu, 10 Aug 2000, Mike Smith wrote: > > > > I want to set up a serial console on a freebsd 4.1 box. I follow the > > instructions at http://www.mostgraveconcern.com/freebsd/. I tried to do > > the following: > > Put > > -h > > in /boot.config. Now you have a serial console. Yes! Two more quick questions: how to change baud rate? Can kermit capture the output? (I use kermit on the other FreeBSD machine). -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: recompiling boot blocks & serial console
On Thu, 10 Aug 2000, Mike Smith wrote: > > On Thu, 10 Aug 2000, Mike Smith wrote: > > > > > > > > > > I want to set up a serial console on a freebsd 4.1 box. I follow the > > > > instructions at http://www.mostgraveconcern.com/freebsd/. I tried to do > > > > the following: > > > > > > Put > > > > > > -h > > > > > > in /boot.config. Now you have a serial console. > > > > Yes! Two more quick questions: how to change baud rate? Can kermit > > capture the output? (I use kermit on the other FreeBSD machine). > > You need to recompile the bootblocks to change the baudrate; set > BOOT_COMCONSOLE_SPEED in /etc/make.conf, then do: > > # cd /sys/boot > # make clean cleandepend > # make depend && make && make install > # disklabel -B > Done! I use Windows 98 HyperTerminal right now because I do not know which Unix terminal program can capture its output into a file. Thanks! BTW, the web page should tell readers to do "make cleandepend" and "make depend". -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Digital Technical Journal - zhihui
Kanad: I remember you subscribed some journal a while ago. Was it "digital technical journal?" I found two papers on VAXcluster filesytem design on No. 5, september 1987. If so and you happen to keep that issue, please borrow me for a while. Thanks. Regards, -Zhihui - FreeBSD - The Power To Serve (http://www.freebsd.org) - To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
kernel debugging on 4.1-release
I try to trace some system call using remote debugging and find something that I can not explain myself (the related source is ffs_write()): case 1: --- 443 if (object) (gdb) break 430 Breakpoint 6 at 0xc0289cea: file ../../ufs/ufs/ufs_readwrite.c, line 430. (gdb) c Continuing. Breakpoint 6, ffs_write (ap=0xc64f5e70) at ../../ufs/ufs/ufs_readwrite.c:438 438 p = uio->uio_procp; In the above case, even if I set breakpoint 6 at line 430, it insists on line 438. case 2: --- (gdb) print p->p_limit $1 = (struct plimit *) 0x In the above case, the statement has just used p->p_limit to do some comparison and yet gdb says its value is -1. The statement using it is: if (vp->v_type == VREG && p && uio->uio_offset + uio->uio_resid > p->p_rlimit[RLIMIT_FSIZE].rlim_cur) { Are these bugs of gdb or am I doing something wrong? I notice that 4.1-release install KLD files at the same time you install kernel. In the past, I only copy the file kernel.debug to the target machine. Do I have to copy those .ko files to the target machine as well? Any help is appreciated. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
delayed write question
I am wondering what exactly will happen if a delayed write goes wrong. It seems to me that the kernel will just clear the error flag and mark the buffer as delayed write again. This gives the buffer a second chance. But how many chances at most a buffer can get before it is aborted. While this may seem not serious on a local filesystem. Consider the NFS case, if a delayed write to a NFS server fails, how many times will we retry? My understanding is that the user program will not notice these retries or aborts until it closes the file. Am I right? Please clarify this for me. Before 4.0, if we write something to a write-protected floppy, the system will panic. Obviously, this panic does not happen on 4.0+. So I guess that the retries must have a limit. Any help is appreciated. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Where is PType in /stand/sysinstall defined?
In the FDISK-like menu of /stand/sysinstall, the PType (partition type) column is given values like 1,2,3,4,6. While the subtype field is well-defined (e.g., 0xa5 = freebsd), I can not find where the partition type is explained. I also tried PCguide in vain. Can somebody explain this to me? Is it useful or some obsolete feature? Thanks. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
tr command in DDB
Hi, I always like the command "db> tr 123" in DDB. Is there an equivalent command in gdb? Thanks. -Zhihui -- ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: vfs.vmiodirenable undocumented
100% agreed. In this particular vmiodirenable case, you can search the mailing list archive and will find that people have discussed it at least one year ago. Plus, if you still do not understand it, read the book "The design and Implementation of the 4.4 BSD Operating System". Anyway, when you get something free, you should be grateful and not complain its quality because you have not paid for it. -Zhihui On Wed, 11 Jul 2001, Jordan Hubbard wrote: > From: Sheldon Hearn <[EMAIL PROTECTED]> > Subject: Re: vfs.vmiodirenable undocumented > Date: Wed, 11 Jul 2001 11:21:24 +0200 > > > I'm very concerned with the fact that this style of response has become > > commonly accepted within the FreeBSD community. > > I would have to disagree that this is an area of concern. > > Let's take it from the other perspective: There are a lot of clueless > individuals out there who just don't understand the volunteer nature > of open source and think it's fine to walk up and post criticisms on > the bulletin board without any truly helpful suggestions, or to demand > work of volunteers rather than offering to ASSIST them in their > efforts. It happens all the time, and each time it does it serves to > disillusion the volunteers just a little bit more as they wonder just > why they're doing this for such an ungrateful pack of cretins. > > In such instances, I'd much rather have the volunteer vent a little > steam and perhaps feel better rather than bottle it up until one day > it just becomes all too much and they walk away from the project > entirely. I'm not being alarmist or dramatic in painting that picture > either because it's happened more times than I like to think about. > > It's also the case that people tend to only really learn lessons when > they're hard lessons, and if getting a public spanking (albeit a mild > one in this case) is what it takes to really drive the point home then > I'll be the first to hand out paddles. Some people, like Mr Xu here, > are even more resistant to clue transfer than most (just read the > archives) and, if anything, Bruce was being rather admirably > restrained with his response. > > In short, your approach may be fine one for conducting sensitivity > training at the Oh Shamalu Spiritual Center, but I'm not sure it's > appropriate here. This is the freebsd-hackers mailing list, and if > you can't take a little engineering heat then this is probably the > wrong place for you. Not everything in life needs to be "kinder and > gentler", to borrow words from George Bush, and I suspect the folks > who run police academies and military training programs would be the > first to agree with me. :) > > - Jordan > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
SPARE_USRSPACE
Can anyone tell me why FreeBSD has 256 bytes of spare space in the user area? Thanks. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: KLD Programming
Yes. But it is not easy. Look at code vfs_vnops.c. You can let a user process open a file and then push the file descriptor into kernel via a special system call. Search the mailing list archive and you will find discussions on how to add a new system call. -Zhihui On Wed, 18 Jul 2001, suid wrote: > > Godday. > > I'm quite new to KLD-programming and have a question: > > Is it possible to read/write to files from a module without > too much effort, but still staying in kernelspace? > > > /suid- > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: using syscalls in a module (stack problem ?)
Just out of curiosity, Linux's kernel stack is one page. Where in the kernel source code that says that we can have two pages instead of one page kernel stack? -Zhihui On Mon, 23 Jul 2001, Eugene L. Vorokov wrote: > > > I call this function with (curproc, PATH_MAX+1), and everything is fine > > > when I have just a few local variables defined in the caller (it all > > > works on MOD_LOAD only). However, if I have 2 buffers, 4096 bytes each, > > > as local variables and then try to allocate userspace memory the same > > > way, kernel crashes - sometimes inside mmap(), sometimes a bit later. > > > > > > Why could this happen ? Is it related to possible stack overflow ? > > > > Yes. The kernel stack is only two pages; you absolutely must not use > > large local variables in the kernel. > > I see. But I still can define them using "static", right ? > > Regards, > Eugene > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: cluster size
You must be asking why the mbuf cluster size is chosen as 2048, right? It is probably a tradeoff between memory efficient and speed. -Zhihui On Mon, 23 Jul 2001, [iso-8859-1] vishwanath pargaonkar wrote: > Hi, > in freebsd can we change the cluster size from 2048 > bytes.If yes how can we do that? > do we have to configure in some file? > > TIA > vishwanath > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: using syscalls in a module (stack problem ?)
Make sense. But there are other things in the UPAGES. -Zhihui On Mon, 23 Jul 2001, Weiguang SHI wrote: > I guess this is it (/usr/src/sys/i386/i386/locore.s): > > 348 /* now running relocated at KERNBASE where the system is linked to > run */ > 349 begin: > 350 /* set up bootstrap stack */ > 351 movl_proc0paddr,%esp/* location of in-kernel > pages */ > 352 addl$UPAGES*PAGE_SIZE,%esp /* bootstrap stack end > location */ > > where UPAGES is defined as 2 in > /usr/src/sys/compile/MYKERNEL/machine/param.h > > 101 #define UPAGES 2 /* pages of u-area */ > > Regards, > Weiguang > > >From: Zhihui Zhang <[EMAIL PROTECTED]> > >To: [EMAIL PROTECTED], "Eugene L. Vorokov" <[EMAIL PROTECTED]> > >CC: [EMAIL PROTECTED] > >Subject: Re: using syscalls in a module (stack problem ?) > >Date: Mon, 23 Jul 2001 12:07:47 -0400 (EDT) > > > > > >Just out of curiosity, Linux's kernel stack is one page. Where in the > >kernel source code that says that we can have two pages instead of one > >page kernel stack? > > > >-Zhihui > > > > > >On Mon, 23 Jul 2001, Eugene L. Vorokov wrote: > > > > > > > I call this function with (curproc, PATH_MAX+1), and everything is > >fine > > > > > when I have just a few local variables defined in the caller (it all > > > > > works on MOD_LOAD only). However, if I have 2 buffers, 4096 bytes > >each, > > > > > as local variables and then try to allocate userspace memory the > >same > > > > > way, kernel crashes - sometimes inside mmap(), sometimes a bit > >later. > > > > > > > > > > Why could this happen ? Is it related to possible stack overflow ? > > > > > > > > Yes. The kernel stack is only two pages; you absolutely must not use > > > > large local variables in the kernel. > > > > > > I see. But I still can define them using "static", right ? > > > > > > Regards, > > > Eugene > > > > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > > with "unsubscribe freebsd-hackers" in the body of the message > > > > > > > > >To Unsubscribe: send mail to [EMAIL PROTECTED] > >with "unsubscribe freebsd-hackers" in the body of the message > > > _ > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp > > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: cluster size
On Tue, 24 Jul 2001, Terry Lambert wrote: > Zhihui Zhang wrote: > > > Hi, > > > in freebsd can we change the cluster size from 2048 > > > bytes.If yes how can we do that? > > > do we have to configure in some file? > > > > You must be asking why the mbuf cluster size is chosen as 2048, right? It > > is probably a tradeoff between memory efficient and speed. > > Ask yourselves: > > "What is the minimum cluster size I would have to have >to be able to contain the maximum MTU worth of data, >yet remain an even multiple of sizeof(mbuf) -- 256 >bytes?" A dumb question: why even not odd multiple? -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: cluster size
I see. It has something to do with the power-of-two allocator we are using inside the kernel. -Zhihui On Wed, 25 Jul 2001, Bosko Milekic wrote: > > On Wed, Jul 25, 2001 at 01:51:51PM -0400, Zhihui Zhang wrote: > > > > > > On Tue, 24 Jul 2001, Terry Lambert wrote: > > > > > Zhihui Zhang wrote: > > > > > Hi, > > > > > in freebsd can we change the cluster size from 2048 > > > > > bytes.If yes how can we do that? > > > > > do we have to configure in some file? > > > > > > > > You must be asking why the mbuf cluster size is chosen as 2048, right? It > > > > is probably a tradeoff between memory efficient and speed. > > > > > > Ask yourselves: > > > > > > "What is the minimum cluster size I would have to have > > >to be able to contain the maximum MTU worth of data, > > >yet remain an even multiple of sizeof(mbuf) -- 256 > > >bytes?" > > > > A dumb question: why even not odd multiple? > > > > -Zhihui > > It actually has to do with the fact that 2K is the only size equal to > or greater than the maximum MTU worth of data that can be multiplied to a page > size without any leftover (in other words, page size modulo 2K is zero). > > -- > Bosko Milekic > [EMAIL PROTECTED] > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: cluster size
I thought doing a memory free is always safe in an interrupt context. Now it seems doing an allocation of memory is safe too. Does MCLGET() call vm_page_alloc() or malloc() eventually? If so, it might block. -Zhihui On Thu, 26 Jul 2001, Terry Lambert wrote: > Bosko Milekic wrote: > > > > Er, wouldn't that be the only way for cards to refil thier DMA > > > > recieve buffers? > > > > > > Look at the Tigon II and FXP drivers. The allocations in > > > the macros turn into m_get, not m_clusterget. > > > > From if_fxp.c (fxp_add_rfabuf(), sometimes called from fxp_intr()): > > > > MGETHDR(...); <-- get mbuf > > if (m != NULL) { > > MCLGET(...); <-- get cluster > > ... > > } > > Yes, I had misread things. Alfred pointed this out to me in > person, earlier. I had been reading the jumbogram code, > which uses a seperate buffer space, and then just incorrectly > assumed. > > Thanks for getting thecorrection into the list archives, so > that future readers will be less confused: you spared me > having to do the same. > > -- Terry > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Allocate a page at interrupt time
FreeBSD can not allocate from the PQ_CACHE queue in an interrupt context. Can anyone explain it to me why this is the case? Thanks, -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Allocate a page at interrupt time
FreeBSD can not allocate from the PQ_CACHE queue in an interrupt context. Can anyone explain it to me why this is the case? Thanks, -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
ata0-master: non aligned DMA transfer attempted
I write a program that writes into a raw device directly. Although the program runs OK, the system prints messages like: ata0-master: non aligned DMA transfer attempted What exactly happens here? Is there any problem in my program? Thanks. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: ata0-master: non aligned DMA transfer attempted
Thanks for your replay. I use gdb to find out that the buffer address is not 16-byte aligned. This leads to a question as to how to align a statically allocated data structure properly. Using union seems to be able to align you on a long boundary (or even long long?), but that is not 16 byte aligned. union { my_data_structure_t xyz; long pad; } The natural alignment seems to work only on primitive data types. If you define: unsigned char sector_buf[512]; It will not always be aligned on a 512 byte boundary, even 16-byte alignment is not guaranteed. Is there a way to achieve this? -Zhihui On Fri, 24 Aug 2001, Julian Elischer wrote: > Zhihui Zhang wrote: > > > > I write a program that writes into a raw device directly. Although the > > program runs OK, the system prints messages like: > > > > ata0-master: non aligned DMA transfer attempted > make sure your DMA buffer is alligned on a 64 byte boundary... > (a page would be best) > and that you are transferring an exact bultiple of 512 bytes. > > The DMA hardware on some macines cannot handle a buffer on less than 16 byte > allignment, (some on odd allignment,.. (it's a bit hardware dependent). > > so be safe and allign your buffers. > > > when it detects it cannot do it, i used PIO instead, so your data is still > transferred... > > > > > What exactly happens here? Is there any problem in my program? > > > > Thanks. > > > > -Zhihui > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > with "unsubscribe freebsd-hackers" in the body of the message > > -- > ++ __ _ __ > | __--_|\ Julian Elischer | \ U \/ / hard at work in > | / \ [EMAIL PROTECTED] +-->x USA\ a very strange > | ( OZ)\___ ___ | country ! > +- X_.---._/presently in San Francisco \_/ \\ > v > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: ata0-master: non aligned DMA transfer attempted
On Sun, 26 Aug 2001, Julian Elischer wrote: > Zhihui Zhang wrote: > > > > Thanks for your replay. I use gdb to find out that the buffer address is > > not 16-byte aligned. This leads to a question as to how to align a > > statically allocated data structure properly. Using union seems to be able > > to align you on a long boundary (or even long long?), but that is not 16 > > byte aligned. > > > > union { > > my_data_structure_t xyz; > > long pad; > > } > > > > The natural alignment seems to work only on primitive data types. If you > > define: > > > > unsigned char sector_buf[512]; > > > > It will not always be aligned on a 512 byte boundary, even 16-byte > > alignment is not guaranteed. Is there a way to achieve this? > > unfortunatly not, except to allocate N+16 bytes, and allign it yourself by > > using a 2nd variable.. > > x = malloc(buffesize + 16) > y = x + 15 & ~15 > ... > write (fd, y, buffersize); > ... > free (x); > exit(); > > > You may experiment to see what allignment your hardware needs... > 2?, 4?, 6?, 16? > > when does the message happen? I believe that message is from ata_dmasetup(): if (((uintptr_t)data & scp->alignment) || (count & scp->alignment)) { ata_printf(scp, device, "non aligned DMA transfer attempted\n"); return -1; } The user address obtained by static allocation is not 16-byte aligned. The kernel routine physio() grabs a physical buffer to do DMA, but it still uses the user's address. The KVA associated with the buffer is not used. -Zhihui To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message