Why does printf(9) hang network?
Why would doing a printf(9) in a device driver (usb, firewire, probably others) cause an obscenely long lockout on /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) ? Printf(9) alone isn't the problem, adding printfs to chown(2) does not cause the problem, but printfs from device drivers do. Grep says that uipc_sockbuf.c is the only file that locks/unlocks sb_sx. The device drivers and printf don't even know that sb_sx exists. 135 int 136 sblock(struct sockbuf *sb, int flags) 137 { 138 139 KASSERT((flags & SBL_VALID) == flags, 140 ("sblock: flags invalid (0x%x)", flags)); 141 142 if (flags & SBL_WAIT) { 143 if ((sb->sb_flags & SB_NOINTR) || 144 (flags & SBL_NOINTR)) { 145 sx_xlock(&sb->sb_sx); 146 return (0); 147 } 148 return (sx_xlock_sig(&sb->sb_sx)); 149 } else { 150 if (sx_try_xlock(&sb->sb_sx) == 0) 151 return (EWOULDBLOCK); 152 return (0); 153 } 154 } More info at: http://www.freebsd.org/cgi/query-pr.cgi?pr=118093 ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Why does printf(9) hang network?
Robert writes: Why would doing a printf(9) in a device driver (usb, firewire, probably others) cause an obscenely long lockout on /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) ? Printf(9) alone isn't the problem, adding printfs to chown(2) does not cause the problem, but printfs from device drivers do. Grep says that uipc_sockbuf.c is the only file that locks/unlocks sb_sx. The device drivers and printf don't even know that sb_sx exists. I can't speak to the details of your situation, but one possible explanation might be: printf runs at the speed of the console, which for serious consoles can be extremely slowly. But shouldn't the RS-232 driver just fill up the UART's FIFO and then sleep? Why can't the network code run while the RS-232 driver sleeps? Actually this must be happening in the call-printf-from-chown case. There must be something different when printf is called from a device driver. Device driver interrupt threads can preempt other threads, possibly while those threads hold locks. That causes them to hold the locks for much longer, as the threads may not get rescheduled for some period (for example, until the device driver is done doing a printf), If the CPU is mostly idle, shouldn't the network thread get scheduled right away? leading other threads waiting for that lock to wait significantly longer. Especially the case if the other thread was spinning adaptively, in which case it will then yield since the holder of the lock effectively yielded. My head is spinning attempting to understand this... You might try forcing all the various threads to run on different CPUs using cpuset and see if the variance goes down. Uniprocessor You can also use KTR + schedgraph to explore the specific scheduling going on, although be aware that KTR can also noticeably perturb schediling itself. Scheduling? The CPU can be mostly idle and the problem still happens. In general, things shouldn't call kernel printf in steady state operation; if they need to log something, they should use log(9) or similar. printf is primarily a tool for printing out device probe information, and for debugging purposes: it is not intended to be fast. Sounds fine to me. Is there a consensus on this? If so, does this need to go into some developer's handbook? How do we get developers to fix the existing code? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
witness Re: Why does printf(9) hang network?
I received a suggestion to try witness, so I build a kernel with WITNESS, WITNESS_KDB, KDB, DDB, KDB_TRACE, and DDB_NUMSYM. This is my first attempt to use witness, so if I got something wrong let me know. Didn't quite make it all the way up to a multiuser prompt: Starting syslogd. Starting rpcbind. lock order reversal: 1st 0xff8029549320 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2559 2nd 0xff000498b000 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:2 85 KDB: stack backtrace: db_trace_self_wrapper() at 0x801dab0a = db_trace_self_wrapper+0x2a _witness_debugger() at 0x805a144c = _witness_debugger+0x2c witness_checkorder() at 0x805a24af = witness_checkorder+0x66f _sx_xlock() at 0x8056b6d4 = _sx_xlock+0x34 ufsdirhash_acquire() at 0x8076f833 = ufsdirhash_acquire+0x33 ufsdirhash_add() at 0x8076fd99 = ufsdirhash_add+0x19 ufs_direnter() at 0x80772498 = ufs_direnter+0x848 ufs_mkdir() at 0x807783d6 = ufs_mkdir+0x5e6 VOP_MKDIR_APV() at 0x808650d4 = VOP_MKDIR_APV+0x34 kern_mkdirat() at 0x805eb740 = kern_mkdirat+0x270 syscall() at 0x8081ec5e = syscall+0x19e Xfast_syscall() at 0x80806ab1 = Xfast_syscall+0xe1 --- syscall (136, FreeBSD ELF64, mkdir), rip = 0x80072c53c, rsp = 0x7fffec88 , rbp = 0x7fffef66 --- KDB: enter: witness_checkorder [thread pid 1255 tid 100076 ] Stopped at 0x8059083d = kdb_enter+0x3d:movq $0,0x6508a0(%rip ) db> Managed to reboot to single user mode, changed /boot/kernel back to my production kernel, and got another lock order reversal rebooting: # sync # reboot Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...0 done All buffers synced. lock order reversal: 1st 0xff0004831448 ufs (ufs) @ /usr/src/sys/kern/vfs_mount.c:1200 2nd 0xff0004831d80 devfs (devfs) @ /usr/src/sys/kern/vfs_subr.c:2083 KDB: stack backtrace: db_trace_self_wrapper() at 0x801dab0a = db_trace_self_wrapper+0x2a _witness_debugger() at 0x805a144c = _witness_debugger+0x2c witness_checkorder() at 0x805a24af = witness_checkorder+0x66f __lockmgr_args() at 0x80552054 = __lockmgr_args+0xd04 vop_stdlock() at 0x805d9239 = vop_stdlock+0x39 VOP_LOCK1_APV() at 0x80864f56 = VOP_LOCK1_APV+0x46 _vn_lock() at 0x805f3cc7 = _vn_lock+0x47 vget() at 0x805e8856 = vget+0x56 devfs_allocv() at 0x804fa993 = devfs_allocv+0x103 devfs_root() at 0x804f9268 = devfs_root+0x48 dounmount() at 0x805e3369 = dounmount+0x419 vfs_unmountall() at 0x805e82a2 = vfs_unmountall+0x42 boot() at 0x80564bd3 = boot+0x683 reboot() at 0x80564ef8 = reboot+0x68 syscall() at 0x8081ec5e = syscall+0x19e Xfast_syscall() at 0x80806ab1 = Xfast_syscall+0xe1 --- syscall (55, FreeBSD ELF64, reboot), rip = 0x80078f83c, rsp = 0x7fffece8 , rbp = 0 --- KDB: enter: witness_checkorder [thread pid 35 tid 100073 ] Stopped at 0x8059083d = kdb_enter+0x3d:movq $0,0x6508a0(%rip ) db> lock order reversal: 1st 0xff0004831da8 vnode interlock (vnode interlock) @ /usr/src/sys/fs/devf s/devfs_vnops.c:349 2nd 0xff8000248858 firewire (firewire) @ /usr/src/sys/dev/firewire/fwohci.c :2227 KDB: stack backtrace: db_trace_self_wrapper() at 0x801dab0a = db_trace_self_wrapper+0x2a _witness_debugger() at 0x805a144c = _witness_debugger+0x2c witness_checkorder() at 0x805a24af = witness_checkorder+0x66f _mtx_lock_flags() at 0x80557b52 = _mtx_lock_flags+0x32 fwohci_poll() at 0x8035feb1 = fwohci_poll+0x31 dcons_cngetc() at 0x80303d69 = dcons_cngetc+0x59 cncheckc() at 0x8052d425 = cncheckc+0x65 cngetc() at 0x8052d44c = cngetc+0x1c db_readline() at 0x801d9ef7 = db_readline+0x77 db_read_line() at 0x801da975 = db_read_line+0x15 db_command_loop() at 0x801d8ad8 = db_command_loop+0x38 db_trap() at 0x801daa49 = db_trap+0x89 kdb_trap() at 0x80590665 = kdb_trap+0x95 trap() at 0x8081f200 = trap+0x170 calltrap() at 0x808067d3 = calltrap+0x8 --- trap 0x3, rip = 0x8059083d, rsp = 0xff80405d9710, rbp = 0xff 80405d9730 --- kdb_enter() at 0x8059083d = kdb_enter+0x3d witness_checkorder() at 0x805a24af = witness_checkorder+0x66f __lockmgr_args() at 0x80552054 = __lockmgr_args+0xd04 vop_stdlock() at 0x805d9239 = vop_stdlock+0x39 VOP_LOCK1_APV() at 0x80864f56 = VOP_LOCK1_APV+0x46 _vn_lock() at 0x805f3cc7 = _vn_lock+0x47 vget() at 0x805e8856 = vget+0x56 devfs_allocv() at 0x804fa993 = devfs_allocv+0x103 devfs_root() at 0x804f9268 = devfs_root+0x48 dounmount() at 0x805e3369 = dounmount+0x419 vfs_unmountall() at 0
Re: memstick.img is bloated with 7% 2K blocks of nulls
Tim Kientzle wrote: > The current UFS code is designed to leave enough "slack space" to > support future file writes. What if you turned the knob all the way down and had just one cylinder group? I assume that newfs would need to be fixed to allow this, but would anything break? The current limits are also wasteful for normal read/write filesystems with large files. Too many cylinder groups, too many inodes, block/frag size too small, etc... ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
quotas an essential feature? (was: svn commit: r218953 - stable/8/usr.sbin/sysinstall)
I promise to enable UFS quotas in GENERIC in one week unless anybody objects now. Huh? I thought GENERIC was supposed to include everything you needed to boot, not every possible feature that someone might desire? But requests to include things required to boot get rejected and nonessential features like quotas get added. WTF? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Keeping /etc/localtime up-to-date
> And while I (think I) recall that the equivalent of /etc/localtime > was implemented in some version of SunOS many years ago as a symlink, > I believe that approach could be problematic for FreeBSD, as it > could impose some unintended requirements on some of the start-up > scripts. I have been running FreeBSD and NetBSD with /etc/localtime being a symlink for years and have not seen any problems as a result. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: New Boot-Loader
Please note that graphical loaders are not very serial console friendly ;-) Yes! Real computers have RS-232 consoles. And please stick with plain ASCII text. The current bootloader is at best ugly and at worst unusable on some terminals. AFAIK the bootloader doesn't have termcap/terminfo available. The default needs to work everywhere. A bootloader does not need to be pretty. If you want a pretty bootloader, put it off to the side and those who both can and want to run it can enable it once the basics are running. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
multi-boot bootstrap?
The discussion of a new bootloader reminded me of the following problem: What we need more than a new bootloader is a new bootstrap. With MBR, NetBSD's boot selector MBR works reasonably well. (About as well as can be expected given the limited space available.) You get a menu of partitions ("slices" in FreeBSD-speak) and can enter a number to select which one you want to boot. If you don't enter anything it times out and boots the default. You can boot a different disk by pressing F1, F2, F3 ... example: Fn: diskn 1: NBSD4.0 2: NB5.0.1 3: FBSD7.1 4: FBSD8.2 The menu labels are limited to 7 chars due to the limited space available in the MBR. But, disks larger than 2 GiB need to be GPT rather than MBR. I haven't found a bootstrap with similar functionality for GPT. GPT allows a larger bootstrap than MBR. So the bootstrap can be nicer. Firmware disk numbering is completely insane on some machines. So spare the poor user from having to guess which disk is which number today. Go through all the disks and look for bootable partitions. Extract the GPT partition labels for these partitions. Present a menu of choices. example: Enter the menu number for the partition you wish to boot. The default will automatically boot in 5 seconds. 1: FreeBSD 7.1 2: FreeBSD 8.2 (default) 3: NetBSD 4.0 4: NetBSD 5.0.1 5: OpenBSD 6: Plan 9 7: reboot back to firmware Boot: As with the boot loader, this needs to work on all machines, and all terminals (without having termcap/terminfo), so just plain ASCII text, no graphics. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Keeping /etc/localtime up-to-date
And while I (think I) recall that the equivalent of /etc/localtime was implemented in some version of SunOS many years ago as a symlink, I believe that approach could be problematic for FreeBSD, as it could impose some unintended requirements on some of the start-up scripts. I have been running FreeBSD and NetBSD with /etc/localtime being a symlink for years and have not seen any problems as a result. The one (and only) problem that I've seen from using a symlink for /etc/localtime is that -- since the /usr partition is not mounted early-on -- boot messages get logged in GMT offset until /usr is mounted. However, some simply ignore this. What boot messages are these? grep 2011 /var/run/dmesg.boot Copyright (c) 1992-2011 The FreeBSD Project. FreeBSD 8.2-RELEASE #9: Sun Mar 6 18:47:36 pst 2011 ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: multi-boot bootstrap?
Now, how are you going to multiboot OpenBSD and NetBSD on a PowerPC machine from the same hard disk. I didn't say anything about a requirement for booting multiple OSes from the same disk. I said: Go through all the disks and look for bootable partitions. Extract the GPT partition labels for these partitions. Present a menu of choices. There can be multiple disks. (Assuming the hardware supports that.) I haven't worked with PowerPC machines and it has been a very long time since I worked with Sparc. I'm more familiar with Alpha, which would take some hacking to boot more than one OS per disk, but some rocket scientist decided to drop FreeBSD support for Alpha, so I suspect that no one here cares about Alpha. From what I know, one or the other can only be as the first entry and it then has to be set from the forth prompt. So, you will need two disks to boot , saya: OpenBSD, NetBSD, FreeBSD, Linux, and MacOSX or a combination of these. On PPC boxes with OpenFirmware 3.x, you actually need to set the active partition if you want to boot Linux and/or freebsd from the forth prompt if both are on the same disk. Can these PPC boxes boot from GPT disks? "active partition" sounds MBRish. Perhaps they can use the "protective MBR" trick? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ifconfig output: ipv4 netmask format
Paul Schenkeveld writes: Although non-contiguous netmasks are not legal anymore in IPv4, our ifconfig still allows to do something like: # ifconfig em0 inet 10.0.5.2 netmask 255.0.255.0 # ifconfig em0 em0: flags=8843 metric 0 mtu 1500 options=219b ether xx:xx:xx:xx:xx:xx inet 10.0.5.2 netmask 0xff00ff00 broadcast 10.255.5.255 media: Ethernet autoselect (1000baseT ) status: active If this is no longer legal, should ifconfig issue a warning? J. Hellenthal writes: This is the year 2011 right ? when are we going to support new users rather than supporting old outdated washed up "scripts" ? Change for the sake of change is not progress. Perhaps when you get more experience you will understand the "joy" of spending massive amounts of time attempting to deal with gratuitious changes. Personally, I'd prefer to be spending my time fixing things that are truly broken rather than repainting the bikeshed in today's fashionable color. And unfortunately there are things that are badly broken. Things that cause data loss. Hardware that isn't supported properly. Some of these are in the PR database if you need a list of useful things to work on. As far as ifconfig goes, I'm in the camp that says 1) Leave the default alone to avoid breaking scripts. 2) Add an option for those who want it. (Put some thought into it, don't just do the first thing that springs to mind.) 3) Those that want a different default can use an alias. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
*printf(9) and PRINTF_BUFR_SIZE
While working on other problems with *printf(9), log(9), etc. I stumbled upon: options PRINTF_BUFR_SIZE=128# Prevent printf output being interspersed. Question 1: Am I correct in thinking that PRINTF_BUFR_SIZE is supposed to prevent this: ada2: 300.000MB/s transfuhub2: 3 ports with 3 removable, self powered ers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled Question 2: Why is vprintf() the only function that does this buffering? As far as I can tell, the various functions that call kvprintf() directly without going through vprintf() do not get buffered. I'm thinking that kvprintf() would be a better place for the buffering. Or would this break something? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: *printf(9) and PRINTF_BUFR_SIZE
While working on other problems with *printf(9), log(9), etc. I stumbled upon: options PRINTF_BUFR_SIZE=128 # Prevent printf output being interspersed. Question 1: Am I correct in thinking that PRINTF_BUFR_SIZE is supposed to prevent this: ada2: 300.000MB/s transfuhub2: 3 ports with 3 removable, self powered ers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled Question 2: Why is vprintf() the only function that does this buffering? As far as I can tell, the various functions that call kvprintf() directly without going through vprintf() do not get buffered. I'm thinking that kvprintf() would be a better place for the buffering. Or would this break something? http://docs.freebsd.org/cgi/mid.cgi?AANLkTinPhcc8Z_BdvoEQUv-ZXlHAYOTQJwlUQDVO\ 8iJ9 Thanks, Alex! That was a useful thread, I now know more about the problem and how to fix it. I gather the answer to Q1 is yes. Given that the word "transfers" is broken, I still think this example is most likely due to my changes that use unbuffered kvprintf() rather than buffered vprintf(). So question 2 remains. BTW, I see some threads where people think this is due to SMP. It happens on uniprocessor machines too. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Need an alternative to DELAY()
FreeBSD 8.2 amd64 uniprocessor kernel: siisch1: DISCONNECT requested kernel: siisch1: SIIS reset... kernel: siisch1: siis_sata_connect() calling DELAY(1000) last message repeated 59 times kernel: siisch1: SATA connect time=60ms status=0123 kernel: siisch1: SIIS reset done: devices=0001 kernel: siisch1: DISCONNECT requested kernel: siisch1: SIIS reset... kernel: siisch1: siis_sata_connect() calling DELAY(1000) last message repeated 58 times kernel: siisch1: SATA connect time=59ms status=0123 ... kernel: siisch0: siis_wait_ready() calling DELAY(1000) last message repeated 1300 times kernel: siisch0: port is not ready (timeout 1ms) status = 001f2000 Meanwhile, *everything* comes to a screeching halt. Device drivers are locked out, and thus incoming data is lost. Losing incoming data is unacceptable. Need an alternative to DELAY() that does not lock out other device drivers. There must be a way to reset one bit of hardware without locking down the entire machine. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Need an alternative to DELAY()
FreeBSD 8.2 amd64 uniprocessor kernel: siisch1: DISCONNECT requested kernel: siisch1: SIIS reset... kernel: siisch1: siis_sata_connect() calling DELAY(1000) last message repeated 59 times kernel: siisch1: SATA connect time=60ms status=0123 kernel: siisch1: SIIS reset done: devices=0001 kernel: siisch1: DISCONNECT requested kernel: siisch1: SIIS reset... kernel: siisch1: siis_sata_connect() calling DELAY(1000) last message repeated 58 times kernel: siisch1: SATA connect time=59ms status=0123 ... kernel: siisch0: siis_wait_ready() calling DELAY(1000) last message repeated 1300 times kernel: siisch0: port is not ready (timeout 1ms) status = 001f2000 Meanwhile, *everything* comes to a screeching halt. Device drivers are locked out, and thus incoming data is lost. Losing incoming data is unacceptable. Need an alternative to DELAY() that does not lock out other device drivers. There must be a way to reset one bit of hardware without locking down the entire machine. Hans Petter Selasky writes: An alternative to DELAY() is the simplest solution. You probably need to do some redesign in the SCSI layer to find a better solution. I keep coming back to the idea that a device driver for one controller should not have to lock out *all* the hardware. RS-232 locks out Ethernet. Disk drivers lock out Ethernet. And so on. Why? Is there some fundamental reason that this *has* to be? I thought the conversion from spl() to mutex() was supposed to fix this? I'm making progress on my project converting printf(9) calls to log(9), and fixing some bugs along the way. Eventually I'll have patches to submit. But this is really a workaround, not a fix to the underlying problem. Redesigning the SCSI layer sounds like a job for someone who took a lot more CS classes than I did. /dev/brain returns ENOCLUE. :-( ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
(no subject)
[ Email attempt #3 and counting... ] Alexander Motin wrote: Warner Losh wrote: I don't suppose that your driver could cause the hardware to interrupt after a little time? That would be more resource friendly... Otherwise, 1ms is long enough that a msleep or tsleep would likely work quite nicely. It's not his driver, it's mine. Actually, unlike AHCI, this hardware even has interrupt for ready transition (second, biggest of sleeps). But it is not used in present situation. On Apr 11, 2011, at 1:43 PM, dieter...@engineer.com wrote: FreeBSD 8.2 amd64 uniprocessor kernel: siisch1: DISCONNECT requested kernel: siisch1: SIIS reset... kernel: siisch1: siis_sata_connect() calling DELAY(1000) last message repeated 59 times kernel: siisch1: SATA connect time=60ms status=0123 kernel: siisch1: SIIS reset done: devices=0001 kernel: siisch1: DISCONNECT requested kernel: siisch1: SIIS reset... kernel: siisch1: siis_sata_connect() calling DELAY(1000) last message repeated 58 times kernel: siisch1: SATA connect time=59ms status=0123 ... kernel: siisch0: siis_wait_ready() calling DELAY(1000) last message repeated 1300 times kernel: siisch0: port is not ready (timeout 1ms) status = 001f2000 Meanwhile, *everything* comes to a screeching halt. Device drivers are locked out, and thus incoming data is lost. Losing incoming data is unacceptable. Need an alternative to DELAY() that does not lock out other device drivers. There must be a way to reset one bit of hardware without locking down the entire machine. Hans Petter Selasky writes: An alternative to DELAY() is the simplest solution. You probably need to do some redesign in the SCSI layer to find a better solution. I keep coming back to the idea that a device driver for one controller should not have to lock out *all* the hardware. RS-232 locks out Ethernet. Disk drivers lock out Ethernet. And so on. Why? Is there some fundamental reason that this *has* to be? I thought the conversion from spl() to mutex() was supposed to fix this? I'm making progress on my project converting printf(9) calls to log(9), and fixing some bugs along the way. Eventually I'll have patches to submit. But this is really a workaround, not a fix to the underlying problem. Redesigning the SCSI layer sounds like a job for someone who took a lot more CS classes than I did. /dev/brain returns ENOCLUE. :-( CAM is not completely innocent in this situation indeed. CAM defines XPT_RESET_BUS request as synchronous. It is not queued, and called under the SIM mutex lock. I don't think lock can be safely dropped in the middle there. Now I think that I could try to move readiness waiting out of the siis_reset() to do it asynchronously. I'll think about it. I've fixed this problem for ahci(4) in HEAD, there should be no sleeps longer then 100ms now (typical 1-2ms). With siis(4) the situation is different. There by default should be no sleeps longer then 100ms (typical 1-2ms). Longer sleep means that either controller is not responding, or it can't establish link to device it sees. I've reduced waiting timeout from 10s to 1s. It should improve situation a bit, but I would look for the original problem cause. Have you done something specific to trigger it? Are your drive/cables OK? Thank you for your prompt attention to this problem, it is very much appreciated. (losing data sucks) However, 100 ms is still way too long. (assuming ms = milliseconds) 1 millisecond is dangerous, if Ethernet is locked out for approx 4 milliseconds there is guaranteed data loss. I'd like to see something more like 100 microseconds worst case (for TCP). Closed source closed hardware black box generates data, has a very small output buffer, cannot be changed. In some cases it insists on using UDP rather than TCP so dropping even a single packet screws up the data. I have cranked the TCP and UDP receive buffer sizes way up, I'm reading the ports at rtprio into a large buffer locked into main memory, etc. etc. Most of the time it works. But if a device driver takes too long, incoming Ethernet packets do not get serviced in time, and I lose data. A device driver doing printf(9) to the RS-232 console is too slow. Changing printf to log(9) works around this. If a disk controller, port multiplier, or disk has a hiccup, I lose data. Siis(4) is the current problem, but IIRC I've had problems from ahci(4) and ata(4) in the past. I'm currently using all three drivers. Is there any way I can keep the Ethernet from being locked out by other drivers? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Add SUM sysctl
> once you reboot into SUM to install world, you are doomed, BECAUSE > ... > Kernel will bitch (GELI part), about world->kernel mismatch and you > won't be able to install world as you cant decrypt geom providers!! Suggestion 1: Install the new stuff into different disk partition(s), leaving the partition(s) you are currently running alone. Then if something doesn't work and the new installation doesn't boot you are not doomed, you can simply boot the previous partition(s) again. Suggestion 2: The kernel may not have an official flag for single vs multi user mode but you can fake it. Try something like "pgrep syslogd". If syslogd is running assume multiuser mode. If syslogd is not running assume single user mode. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"