Stuck in "objtrm"
I have an old 486 here that I thrash to death occasionally. Well, at least I try to get it to page to death. I started a make world last week and forgot about it. Today I noticed that it's been stuck for most of the week. Almost everything is fine, but one cc1 process is stuck in "objtrm". Oh, and I hung a "cat /proc/31624/map", too, trying to get some details (now stuck in "thrd_sleep"). So, am I just tripping over some old long-fixed bug? Or is this a new one worth investigating? The kernel is from 1999/06/16 (just before the vfs_cluster.c commit). Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Stuck in "objtrm"
On Friday, 2nd July 1999, Stephen McKay wrote: >I have an old 486 here that I thrash to death occasionally. Well, at least >I try to get it to page to death. I started a make world last week and >forgot about it. > >Today I noticed that it's been stuck for most of the week. Almost everything >is fine, but one cc1 process is stuck in "objtrm". Oh, and I hung a "cat >/proc/31624/map", too, trying to get some details (now stuck in "thrd_sleep"). > >So, am I just tripping over some old long-fixed bug? Or is this a new one >worth investigating? The kernel is from 1999/06/16 (just before the >vfs_cluster.c commit). Well, it's happened again, but this time it is a recent -current, less than a day old. After a couple hours of heavy paging (yes, this is a slow box), the make world hangs with cc1 in "objtrm". All the other processes seem to be waiting for it to exit. It's the only cc1 around, by the way, even though it was a -j5 parallel compile. All other machine functions are fine. ps, top, vmstat, et al show normal looking values. Does anybody have any hints on how to debug this? I know that "objtrm" implies that paging is in progress on some object, even though there's no paging happening, and so it's probably an accounting error with object->paging_in_progress. But other than that, I'm not sure where to look. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Stuck in "objtrm"
On Tuesday, 6th July 1999, Stephen McKay wrote: >the make world hangs with cc1 in "objtrm"... I'm having a fun old conversation with myself here! ;-) Here's some concrete info: (kgdb) p/x *(struct vm_object*) 0xc32ea21c $13 = {object_list = {tqe_next = 0xc3389e58, tqe_prev = 0xc323fdec}, shadow_head = {tqh_first = 0x0, tqh_last = 0xc32ea224}, shadow_list = { tqe_next = 0xc327b8dc, tqe_prev = 0xc32cb734}, memq = { tqh_first = 0xc0308e80, tqh_last = 0xc03046ec}, generation = 0x3004, type = 0x1, size = 0x2a7, ref_count = 0x0, shadow_count = 0x0, pg_color = 0x5, hash_rand = 0xfd9a69d7, flags = 0x21c8, paging_in_progress = 0x1, behavior = 0x0, resident_page_count = 0x9, backing_object = 0x0, backing_object_offset = 0x0, last_read = 0x14, pager_object_list = {tqe_next = 0xc323c438, tqe_prev = 0xc323a424}, handle = 0x0, un_pager = {vnp = {vnp_size = 0x16}, devp = {devp_pglist = { tqh_first = 0x16, tqh_last = 0x0}}, swp = {swp_bcount = 0x16}}} The high points: ref_count=0 shadow_count=0 type=1 (OBJT_SWAP) paging_in_progress=1 resident_page_count=9 flags=0x21c8 (onemapping, mightbedirty, writeable, pipwnt, dead) A typical memory page from this object: (kgdb) p/x *(struct vm_page*) 0xc02ffd90 $14 = {pageq = {tqe_next = 0xc0317dc0, tqe_prev = 0xc02f1960}, hnext = 0x0, listq = {tqe_next = 0xc0317dc0, tqe_prev = 0xc02f196c}, object = 0xc32ea21c, pindex = 0x2f, phys_addr = 0x4f4000, queue = 0x41, flags = 0x0, pc = 0x34, wire_count = 0x0, hold_count = 0x0, act_count = 0x8, busy = 0x0, valid = 0xff, dirty = 0xff} The high points: queue=inactive flags=0 wire_count=0 hold_count=0 busy=0 valid=ff dirty=ff All 9 of them are like that. So, no busy or PG_BUSY or anything. No paging really in progress after all. So the object's paging_in_progress count is out. Who was watching what code changed recently? Remember I had this problem on a kernel from 1999/06/16 too. So it's an "old" problem. Off to research the next installment... Stephen. PS I haven't worked out yet how to find the stack of the errant process. Any hints? The stack trace should be helpful. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Stuck in "objtrm"
On Tuesday, 6th July 1999, Andrew Gallatin wrote: >Yes. say 'proc pidhashtbl[PID & pidhash]->lh_first' in kgdb. >I suspect that it will be in exit() also.. Magic! It looks like a plain old exit() to me. (kgdb) proc pidhashtbl[27157&pidhash]->lh_first (kgdb) bt #0 mi_switch () at ../../kern/kern_synch.c:827 #1 0xc014a5bd in tsleep (ident=0xc32ea21c, priority=4, wmesg=0xc023db84 "objtrm", timo=0) at ../../kern/kern_synch.c:443 #2 0xc01e9741 in vm_object_terminate (object=0xc32ea21c) at ../../vm/vm_object.h:230 #3 0xc01e96f1 in vm_object_deallocate (object=0xc32ea21c) at ../../vm/vm_object.c:382 #4 0xc01e6acb in vm_map_entry_delete (map=0xc3047440, entry=0xc3240190) at ../../vm/vm_map.c:1680 #5 0xc01e6c89 in vm_map_delete (map=0xc3047440, start=0, end=3217022976) at ../../vm/vm_map.c:1783 #6 0xc01e6d1d in vm_map_remove (map=0xc3047440, start=0, end=3217022976) at ../../vm/vm_map.c:1808 #7 0xc0141d20 in exit1 (p=0xc322f0a0, rv=0) at ../../kern/kern_exit.c:220 #8 0xc0141b24 in exit1 (p=0xc322f0a0, rv=-1021614488) at ../../kern/kern_exit.c:106 #9 0xc020e41a in syscall (frame={tf_fs = 47, tf_es = 137297967, tf_ds = -1078001617, tf_edi = 136021320, tf_esi = 0, tf_ebp = -1077947348, tf_isp = -1020915756, tf_ebx = -1, tf_edx = 135690384, tf_ecx = 136200192, tf_eax = 1, tf_trapno = 12, tf_err = 2, tf_eip = 135656524, tf_cs = 31, tf_eflags = 582, tf_esp = -1077947368, tf_ss = 47}) at ../../i386/i386/trap.c:1056 #10 0xc0202cc0 in Xint0x80_syscall () error reading /proc/27157/mem To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Stuck in "objtrm" - live kernel test to run
On Thursday, 8th July 1999, Matthew Dillon wrote: >There is a way we can find out for sure. For any of you with processes >stuck in objtrm, see if you can gdb the kernel and get a backtrace >of that process to see if it might be in a state where a previous >call context is holding a PIP count on the object. Just for completeness, here's mine again, done with your ps trick: (kgdb) back #0 mi_switch () at ../../kern/kern_synch.c:827 #1 0xc014a5bd in tsleep (ident=0xc32ea21c, priority=4, wmesg=0xc023db84 "objtrm", timo=0) at ../../kern/kern_synch.c:443 #2 0xc01e9741 in vm_object_terminate (object=0xc32ea21c) at ../../vm/vm_object.h:230 #3 0xc01e96f1 in vm_object_deallocate (object=0xc32ea21c) at ../../vm/vm_object.c:382 #4 0xc01e6acb in vm_map_entry_delete (map=0xc3047440, entry=0xc3240190) at ../../vm/vm_map.c:1680 #5 0xc01e6c89 in vm_map_delete (map=0xc3047440, start=0, end=3217022976) at ../../vm/vm_map.c:1783 #6 0xc01e6d1d in vm_map_remove (map=0xc3047440, start=0, end=3217022976) at ../../vm/vm_map.c:1808 #7 0xc0141d20 in exit1 (p=0xc322f0a0, rv=0) at ../../kern/kern_exit.c:220 #8 0xc0141b24 in exit1 (p=0xc322f0a0, rv=-1021614488) at ../../kern/kern_exit.c:106 #9 0xc020e41a in syscall (frame={tf_fs = 47, tf_es = 137297967, tf_ds = -1078001617, tf_edi = 136021320, tf_esi = 0, tf_ebp = -1077947348, tf_isp = -1020915756, tf_ebx = -1, tf_edx = 135690384, tf_ecx = 136200192, tf_eax = 1, tf_trapno = 12, tf_err = 2, tf_eip = 135656524, tf_cs = 31, tf_eflags = 582, tf_esp = -1077947368, tf_ss = 47}) at ../../i386/i386/trap.c:1056 #10 0xc0202cc0 in Xint0x80_syscall () error reading /proc/27157/mem And for extra points: (kgdb) frame 4 #4 0xc01e6acb in vm_map_entry_delete (map=0xc3047440, entry=0xc3240190) at ../../vm/vm_map.c:1680 1680vm_object_deallocate(entry->object.vm_object); (kgdb) p/x *entry $10 = {prev = 0xc3047460, next = 0xc3249e38, start = 0x81c6000, end = 0x8458000, avail_ssize = 0x0, object = {vm_object = 0xc32ea21c, sub_map = 0xc32ea21c}, offset = 0x15000, eflags = 0x0, protection = 0x7, max_protection = 0x7, inheritance = 0x1, wired_count = 0x0} I haven't made any clever conclusions from this, but you might do better. >Note: the process cannot be swapped out, so if you've had a process >stuck in objtrm for a long time try doing as "ps axfl" to force it's >upages in and then gdb should be able to backtrace it. The 'f' in the ps >does that. Cute. After the ps axlf, all the swapped out processes went from 0 to 8 KB resident. But the stuck process stayed at 0 KB resident. It wasn't swapped out anyway, according to the ps flags, so it should have had some resident pages. Seems like a contradiction to me. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Stuck in "objtrm" - live kernel test to run
On Saturday, 10th July 1999, Matthew Dillon wrote: >I'm trying to simulate your 486 setup. You must love pain! A make -j5 >buildworld on a 16MB-limited machine pages like hell (200-400 pageins/sec >AND 200-400 pageouts/sec simultaniously, almost continuously). Maximal pain, maximal gain! :-) The only reason I'm using a big, powerful 486 is that my 386 here died and there were none left to replace it. With NFS src and obj, make world was taking over a week. No joking. >Are you >using any special sysctls or special kernel config options? I have been using "sysctl -w vm.swap_async_max=2" for a while. It seems to help throughput on this machine, and definitely helps interactive performance. I suspect that a few extra I/O limiters, or some sort of I/O rate quota system would help enforce fairness even on faster machines. For example, we have a performance anomaly with squid on 3.2 that could be over-eager pagedaemon behaviour flooding the I/O system. >Also, try the latest -CURRENT and see if you can still get it stuck in >objtrm. I haven't had any luck so far in my simulation. If you still >get stuck in objtrm then try Alan's patch and see if that has an effect. Maybe you should send me your latest patch, the atomic_* fixer and I'll give it a whirl. It hasn't turned up in the cvs-cur CTM patches yet. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Softupdates reliability?
I had a recent crash on my home box that makes me question the reliablity of softupdates. My home box runs 3.2-R, but as far as I can determine, there have been no reliablity fixes to softupdates since then. So a failure here should be relevant to -current. Hardware: K6-2/300, 64MB ECC SDRAM, Fireport40 (ncr 875j) U/W SCSI, 2xDCAS-34330 + 1xDDRS-39130 disks (all U/W from IBM), Toshiba CD, Exabyte 8200. This hardware has run happily for a long time, and often experiences high load. I was extracting from the Exabyte to the DDRS disk while applying a CTM update from that disk against one of the DCAS disks when it crashed. The Exabyte went wonky (took about 6 goes to get the tape ejected) and the rest of the disk system locked up. The SCSI adapter was so confused I had to power down. When it recovered, there was an "UNEXPECTED SOFT UPDATE INCONSISTENCY" which turned out to be a referenced but unallocated inode, and two zero size directories. In all, a couple dozen files ended up in lost+found. So, what do people think is the most likely: 1) the SCSI adapter told the disk to write crap to a couple places on the disk (breaking an inode and some directories) 2) softupdates can't handle a sudden interruption (leaving many unwritten blocks) If many other people have survived sudden power loss or similar no-sync type crashes, I'll be happy to believe option 1 caused my problem. If not, then perhaps softupdates still has incomplete handling of dependencies. Of course, if it is option 1, I'm keen to know what's wrong with the current driver! Stephen. By the way, I have a 3.2-stable machine at work on which I installed revision 1.27 of softupdates, instead of the 1.20.2.2 == 1.24 normally included. It hasn't crashed ever :-) so I don't know if that will cause me problems. It works fine in normal operation. Perhaps it is time to MFC. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Small fix to netstat argument processing
I've got very used to an alias ns='netstat -f inet' which lets me do all the things I like to do without annoying me with stuff I don't want to see. All the options that don't care about the address family just ignore that option. Or, used to. Recently that changed, and "netstat -f inet -i" in particular changed to give the -f flag priority over the -i flag. This makes no sense to me, so I intend to commit this patch: --- netstat/main.c.old Tue Jan 4 16:14:46 2000 +++ netstat/main.c Thu Jan 6 18:19:24 2000 @@ -460,9 +460,6 @@ */ #endif if (iflag) { - if (af != AF_UNSPEC) - goto protostat; - kread(0, 0, 0); intpr(interval, nl[N_IFNET].n_value, NULL); exit(0); @@ -501,7 +498,6 @@ exit(0); } - protostat: kread(0, 0, 0); if (af == AF_INET || af == AF_UNSPEC) for (tp = protox; tp->pr_name; tp++) It removes the special case that specifically makes "netstat -f inet -i" act the opposite to the way it used to (and the way I expect). Any problems, folks? Is there some bizarre IPv6 impact I've not seen? Hmm, I've just noticed some small misalignment of column headings in the default output. I'll fix that too. Stephen. PS Roll on 4.0-RELEASE! To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Small fix to netstat argument processing
On Thursday, 6th January 2000, Yoshinobu Inoue wrote: >Does these patches fix your problem, or should another better >fix is desired? Please give me any opinions. It passes all my tests. Please commit it. Thank you! And earlier you wrote: >Because now there is interface statistics display mode, when, e.g. > > netstat -s -I bar0 -f inet6 > >is specified. (though this is inet6 only now.) I see where you are going now. The syntax of netstat, already complex, is becoming even more complex. More detail in the man page will be necessary soon. Also, the "iflag" variable might have too many uses now. But this can wait, now that the immediate difficulties have been resolved. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Crash from ^T during heavy paging
I'm currently giving 4.0 a thrashing in the best way I know. I run way too much stuff and let it page madly all day. Here's how I killed it: 1) pick a 32MB box 2) make -j20 buildworld 3) lean on ^T and let autorepeat go for it Soon it dies in calcru() called from ttyinfo(). The stack trace showed that I caught it part way through a fork(). In calcru(), p->p_stats has a bad value because it is initialised in vm_fork() sometime *after* the P_INMEM flag is set, and there are some M_WAITOK mallocs between them. The problem is that calcru() thinks that P_INMEM means that the proc structure is fully and accurately populated. But P_INMEM is one of the first flags set. A few places test for p->p_stats == NULL but that doesn't look applicable since p->p_stats is uninitialised in this case. Hmm. I can't see any use for that test at first glance. So, calcru() and possibly some other places, are looking at a struct proc before it's all there. What's the "proper" way to do it? Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Why not a default number of pings?
On Tuesday, 18th January 2000, "Leif Neland" wrote: >I've been hit by a "forgotten ping" again. > >I still do not see a reason for not having a default number of pings, instead >of infinite. The only reason I've seen is "It's always been so". I find this argument rather odd. Train yourself to not forget your running ping. If you forget ping, then you probably forget to log out, forget to back up your machine, or forget your car keys. It's not ping's fault. >Even if a default of 4 pings is not acceptable, because windows does it that >way, why not a large default then? A large but finite default is a surprise to seasoned users. That's bad. A small default is also a surprise, but you get the surprise quickly. >If somebody _really_ want to ping forever, let them use -t0, and defend the >rest of us from our blunders of forgetting a ping, keeping the line open >infinitely. alias ping='ping -c4' What's so hard about this? Why break ping for the rest of us when you have total control of your own circumstances? >How about a MAX_PING=3600 in make.conf or so? Unnecessary cruft. We have plenty of cruft already, and don't need any more. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
That fix for the ^T crash
Hi, Brian! I'm concerned that your fix won't make it before the code freeze. Is there a problem with it? I admit I haven't actually tested it. :-( My excuse is that I assumed you had. Or should I just do a quick test on your patch (+ bde fixes) and commit it myself? Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Problems installing FreeBSD 4.0 20000125-CURRENT
On Thursday, 27th January 2000, "Rodney W. Grimes" wrote: >> < >said: >> >> >> 3. On the first reboot after installing, the keyboard was in a funny >> >> state. >I have seen this on numerious occasion, but have never tracked it down >to any one specific thing. All on desktop and servers, but thats >only because we don't do laptops. > >I have not seen it in quite some time (about a month), so I am thinking >it has probably been unknowingly fixed someplace. I'll keep an eye >out for it. I had this problem on several machines back around version 3.2. I assumed it was a problem between X11 and the keyboard driver. I added a 2 second delay before starting xdm and had no problems after that. I've not seen the problem without X11 being involved. I admit I just forgot about it after I got my workstation going. :-( Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
/dev/random limited to irq < 16
I found out much to my surprise that our SMP box is not collecting ANY entropy for /dev/random. All the interesting IRQs are over 16, and nobody uses the console. >From sys/i386/i386/mem.c 1.79: /* * XXX the data is 16-bit due to a historical botch, so we use * magic 16's instead of ICU_LEN and can't support 24 interrupts * under SMP. */ Why don't we just flip this from a 16 bit to a 32 bit parameter in time for 4.0-RELEASE? Should just require a quick fiddle in mem.c and in rndcontrol. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Softupdates reliability?
On Tuesday, 24th August 1999, Peter Jeremy wrote: >The exact order of events is not clear from this. In general, I'd say >that if something managed to upset the SCSI bus sufficiently to >confuse every target on it, then there's a reasonably likelihood that >data transfers were also corrupted. A serious bus corruption during a >disk write (either command or data phase) would have a reasonable >chance of resulting in corrupt data on the disk (either the wrong data >in the right place or the right data in the wrong place). Yes, I can't tell whether the confused SCSI adapter upset the Exabyte and maybe zero'd some disk sectors, or whether the Exabyte went bananas first and took out everything else. This system gets a LOT of use (I'm using it right now), but the Exabyte obviously isn't used as often as the disks. I might move the Exabyte on to an aha1542 as a precaution. >I'm not sure how to go about isolating the problem. I don't suppose >you happened to bump one of the cables, or suffer a power glitch? No power glitch or bumped cables. All quality gear, no overclocking, good cooling, surge suppressors, etc. I don't like "It was just one of those things". That's not how computers work. I've either got bad hardware or there are bugs. To counter the bugs, I'm about to go to the latest -stable. Bad hardware will show itself eventually. What I really should do is build a test system with softupdates and crash it a lot. (Using DDB to pause, then switch off, so no partial writes.) Could take a while... Oh, and Brian wanted to know the processor revision. I don't know of any problems with K6-2/300s, but here's the info: CPU: AMD-K6(tm) 3D processor (300.68-MHz 586-class CPU) Origin = "AuthenticAMD" Id = 0x580 Stepping=0 Features=0x8001bf Stephen To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
K6-2 revisions (was: Re: Softupdates reliability?)
On Tuesday, 24th August 1999, "Brian F. Feldman" wrote: >On Tue, 24 Aug 1999, Richard Tobin wrote: > >> > > Origin = "AuthenticAMD" Id = 0x580 Stepping=0 >> >> > You have one of the first K6-2s off the line. There were definite problems >> > with these, and as such, they were specially distinguished by having 66 >> > printed on top. >> >> I have a 0x580 which has had no problems at all. I'm pretty certain >> it doesn't have 66 stamped on it. Are they all supposed to have this, >> or were they tested and the dodgy ones stamped 66? > >It must be the latter. My 0x580 had the 66, so it must be that the dodgy >ones got labelled 66 and not all the 0x580s were defective. I think the story went along the lines that AMD were making K6-2/300's for a while, then went to a less rigorous test procedure for just a short time until they realised that some of the processors they released wouldn't work at 100MHz bus speeds, though they were ok at 66MHz. So they went back to the better testing procedure for the 100MHz models, but also released some 66MHz only models. Mine was indeed one of the earliest, but there have been no problem with it, and during my strange disk crash the CPU kept updating the X11 load graph and stuff. The problem(s) must be elsewhere. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Softupdates reliability?
On Tuesday, 24th August 1999, Wilko Bulte wrote: >Hmm. I would generally expect SCSI errors etc to occur. Assuming the driver >reports those one would at least know the bus was whacko. I saw no errors, but that's not entirely surprising since I was running X11 and by that time xconsole was probably swapped out, and the disk system was stuck, so it wouldn't have been able to report anything. I gave up on a serial console a very long time ago because this machine is so reliable. :-) Also, I recall (rumour?) that the ncr driver is not as robust in the face of errors as the adaptec driver, at least with CAM. Anybody know the facts? I know, for example, that I can't get bad block lists using my scsi adapter, but people using adaptecs can. That shows that the ncr driver is in some sense incomplete. I've been meaning to look into that, but you know how time gets away. So, after all this, I still don't know if I have any real evidence of anything at all. I'll just have to keep at it until it happens again. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
SCSI surprise! (was: Softupdates reliability?)
[I'm trying my first crosspost experiment here. Please follow up to -scsi.] A week ago I posted my strange crash and subsequent doubts about the proper functioning of softupdates. This is more of the story. I examined the lost+found directory more closely and of the few files that I traced, they were all temporary files or newly created directories (ports actually) created in the CTM update process. So, maybe I didn't really lose anything. Maybe fsck just doesn't recognise one of the safe-but-crashed modes you get when using softupdates. But unfortunately, I needed a CVS tree urgently and restored a backup. To make up for this, I promise to do serious destruction testing of softupdates soon. But, I had another crash almost as soon as I started using the machine again. Again, the Exabyte was being used (but only rewinding at the time), but the obvious trigger this time was intense disk activity (from "rm"). The active file system was not using softupdates, and had a number of fsck -p correctable errors on reboot. Conclusions: 1) The Exabyte was not to blame for the crash 2) The crash wasn't a "scribble junk" crash (first one probably wasn't either) 3) Regular mounts are still safer than softupdates I took the lid off anyway hoping to find anything at all weird and noticed something I had forgotten. I was using a Seagate ST51080N 1GB disk earlier for some experimenting and had disconnected the POWER, but not the SCSI CABLE. (It's a really noisy drive!) When I also unplugged the SCSI cable, all crashes stopped. I've now used the machine intensively for several days (copying over 20GB of small and big files, and read and written several tapes) without incident. Conclusions: 4) My stepping of K6-2/300 is just fine 5) My Exabyte really is ok :-) 6) It is NOT safe to have a powered down SCSI device attached to a SCSI chain 7) The world really is a wonderful place ;-) So, apart from being happy at having stable hardware again, I am intensely curious about this. Why is a powered down SCSI device so nasty? For example, the first crash locked up my SCSI card so that reset didn't fix it, and the second crash hung one of my disks so that it had to be powered down to even be recognised! Is there a standard for this stuff? Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: optional 'make release' speed-up patch
On Thursday, 9th September 1999, "John W. DeBoskey" wrote: > The following patch to /usr/src/release/Makefile allows the >specification of the variable FASTCLEAN, which instead of doing >a recursive rm on CHROOTDIR, simply umounts/newfs/mounts. >+ -device=`df | grep '${CHROOTDIR}' | cut -f1 -d' '` && \ >+ /sbin/umount ${CHROOTDIR} && \ >+ /sbin/newfs $$device && \ >+ /sbin/mount ${CHROOTDIR} && \ >+ /bin/df ${CHROOTDIR} This is going to look like I'm putting the boot in after everyone else has already expressed a negative opinion, but I want to reinforce why this is a dangerous option, and I think a bit of unhappiness now will result in safer future contributions. I'm really not trying to be a pain. First up, destroying file systems in a makefile is a very rare event, and a pretty spectacular trick to use as a performance optimisation. Admittedly a make release is heavy stuff already, but destroying file systems is one step further than expected. Have you tested this and verified that it is a major time saving? I suspect it is not. Optimising the 10% case instead of the 90% case just increases the likelihood of bugs, and it is doubly risky to use the big guns on the small fry. The proposed code isn't very careful, and would attempt to destroy the wrong file system if, for example, I had CHROOTDIR set to /d. (Maybe I like calling file systems /a, /b, /c, etc like those crazy folks on freefall.) I doubt that it would succeed (because of checks for mounted file systems) but it would try. So, the code should verify that CHROOTDIR is a local mounted file system, and of the type you intend to make. The code runs newfs on the block device. It really should run on the character device. In a dangerous thing like on-the-fly file system destruction and creation, precision is important, even if only to instill confidence in the user when it runs. Defining FASTCLEAN to destroy a file system is a surprise unless you are intimately familiar with the makefile. That's a bear trap on the nature walk. For example, I used MACHINE all the time in my .profile until it started screwing up FreeBSD compiles. FASTCLEAN is probably somebody's favourite variable for some unimportant thing. They might never run make release, but it's lurking there for them when they do. The variable name should be more descriptive, and require that it be set to a particular value before it triggers. So, what's the upside of all this gloom? Do I really think this is the most dangerous thing I've ever seen? Well, no. I just think it is inadvisable. There is nothing stopping you from creating a script that runs make release for you. Then you can newfs your filesystem there, fully aware of the risks, and fully in control. For everyone else, the enormous rm is a useful test of the softupdates code. Most things have a silver lining if you know how to look at them. :-) Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Fsck follies
I was giving vinum + softupdates a bit of a workout on 4 really old SCSI disks (Sun shoeboxes, if you must know) attached to an aha1542B. The rest of the machine is a Pentium 133 with 64MB of parity ram, a few more disks, and another aha1542B. It runs -current (about 10 days old now). I was copying a newer -current source tree onto the box when I lost power to my house for maybe half a second. Being foolish and shortsighted, I have no UPS. (An interesting side note: out of the 3 machines in use at the time, 2 of the keyboards locked up and required a power down to recover. I was unaware that keyboards could crash.) When the system came back up, fsck -p didn't like the vinum volume. No sweat, I ran it manually. There were many INCORRECT BLOCK COUNT I= (4 should be 0) messages. I assumed this was an artifact of soft updates. The fsck completed successfully. Being paranoid, I reran fsck. This time it reported a number of unreferenced inodes (199 to be exact), and linked them in to lost+found. It is this last item that bothers me. When the first fsck completed, the filesystem should have been structurally correct. But it wasn't. A third fsck confirmed that 2 runs of fsck were enough. I seem to recall sagely advice from days gone by to always run fsck twice and sync thrice. I thought I could forget all that stuff nowadays. By the way, I saved the broken old source tree and compared it to the full tree. There were no unusual differences, except for the broken one being incomplete. So, if fsck were a little better, things would be fine. As good as you could expect, given a power failure. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Fsck follies
On Sunday, 21st November 1999, Christopher Masto wrote: >On Sun, Nov 21, 1999 at 10:36:32PM +1000, Stephen McKay wrote: >> When the system came back up, fsck -p didn't like the vinum volume. >> No sweat, I ran it manually. There were many >> >> INCORRECT BLOCK COUNT I= (4 should be 0) >> >> messages. I assumed this was an artifact of soft updates. The fsck >> completed successfully. >> >> Being paranoid, I reran fsck. This time it reported a number of >> unreferenced inodes (199 to be exact), and linked them in to lost+found. >> >> It is this last item that bothers me. When the first fsck completed, >> the filesystem should have been structurally correct. But it wasn't. >> A third fsck confirmed that 2 runs of fsck were enough. > >Presumably you are using vinum for mirroring? I have had to stop >doing so after trashing several filesystems. There are some serious >bugs that allow the plexes to get out of sync; as reads from a mirror >set are round-robin, this can be very bad. No, I was just striping them (4 x 660 MB disks, 96 KB interleave). Vinum had nothing to do with the problem. I was just reporting all the facts, just in case. I think there is a fault in fsck. Possibly it is because softupdates changed the rules. Having run md5 over the good copy and the broken (power failure interrupted) copy as well as everything in lost+found, I can say that no corrupted files survived, and everything in lost+found was a good copy of some file or other. So softupdates appears to be doing the right thing. But fsck didn't fix everything broken by the power interruption. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Fsck follies
On Monday, 22nd November 1999, Bernd Walter wrote: >On Mon, Nov 22, 1999 at 02:57:39PM +1000, Stephen McKay wrote: >> I think there is a fault in fsck. Possibly it is because softupdates >> changed the rules. Having run md5 over the good copy and the broken >> (power failure interrupted) copy as well as everything in lost+found, >> I can say that no corrupted files survived, and everything in lost+found >> was a good copy of some file or other. So softupdates appears to be >> doing the right thing. But fsck didn't fix everything broken by the >> power interruption. >> >Sometimes fsck tells you that it needs a rerun. >See /usr/share/doc/smm/03.fsck/paper.ascii.gz for details about fsck. >Are you shure that this was not the case? It should print "PLEASE RERUN FSCK" in that case. It did not do so. It looks like other messages like "FILE SYSTEM MARKED DIRTY" are likely in this case. It said "FILE SYSTEM MARKED CLEAN" that evening. Eventually I'll get the time to do repeated powerdowns of my equipment to try to reproduce this. I hope you can see why I'm not rushing into this. :-) Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: HEADSUP: wd driver will be retired!
On Friday, 10th December 1999, "Kenneth D. Merry" wrote: >Brad Knowles wrote... >> At 3:05 PM -0700 1999/12/10, Kenneth D. Merry wrote: >> >> > I agree that the CAM integration shouldn't be used as a precedent here. >> > I don't agree with your characterization of it as a "debacle", though. >> > >> > On the whole, we gained a whole lot and lost very little. >> >> Long-term, yes I believe we gained a lot. Short-term, what I >> recall having heard from some of the people who lived through it, >> well let's just say it was really ugly and nasty for a certain period >> of time. > >I don't think it was ugly and nasty at all. You're basing your opinions >on second hand hearsay. If you can produce specific examples of why it >was "really ugly and nasty", fine, but why not avoid making statements you >can't support? This must depend on your perspective. My first hand view is that it was ugly and nasty. This is because I lost support for hardware I was actively using (some temporarily, some permanently), and because I had no control over the pace of change. For a bunch of reasons, there was no way I could keep up (and that meant porting old drivers to keep up). It sure felt ugly to me. The unnecessary renaming of device files made it worse. But that shouldn't stop us from moving forward with the ata driver. I think that a small slowing of the pace, and a bit more understanding toward those with unusual hardware will help. And I support PHK's hard line stance (except for the rushed pace) toward making the kernel break for users of wd. It has to be so, or no one will move. The wd code will still be in the CVS tree for desperate people to revive to use, and to port the missing bits into the ata driver. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: HEADSUP: wd driver will be retired!
On Friday, 10th December 1999, Mike Smith wrote: >The same mentality that made the CAM cutover a "debacle" is making the >ata cutover a "debacle". This "mentality" might be an unavoidable part of human nature. I found my first reaction was "How dare they take away something I have now?!" and it took some careful thinking to see that my loss was actually very small against future benefits. It might be that these things have to be predicted by -core and handled "touchy feely" like: core: What if we do this ? public: Um, sounds scary. When will you do it? Will I lose anything? core: We think a month from now, and you will lose support for and . public: We don't use and any more, so fine. instead of the current (caricatured for emphasis): core: We will do soon. Probably today. public: Oh my God! core: It's for your own good. You always complain and make it difficult! public: We don't want to change anything, ever! It's so hard! You must support all my hardware for ever and ever! >Fortunately, the CAM folks persisted despite the criticism, and I'm glad >to see that Soren is taking the same stance. Not everything improved with CAM. Personally I'm only receiving the benifits of CAM now, about a year after it replaced the old system. On the balance it has been good for FreeBSD, but you have to remember that there will be small pockets of users that will get the short end of the stick. How the project deals with the losers in these deals is important for its long term health. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
dc driver and underruns (was: Strangeness with 4.0-S)
On Monday, 10th July 2000, Stefan Esser wrote: >On 2000-07-09 20:52 +1000, Stephen McKay <[EMAIL PROTECTED]> wrote: >> On Saturday, 8th July 2000, Stefan Esser wrote: >> >>>Oh, there are renegotiations after each overrun ??? >> The code at the point that an underrun is detected is: >> >> printf("dc%d: TX underrun -- ", sc->dc_unit); >> if (DC_IS_DAVICOM(sc) || DC_IS_INTEL(sc)) >> dc_init(sc); >> >> After that, it sets the new threshold, or store and forward mode. That >> conditional (which resets the DE-500 style cards I own), looks deliberate >> since it is so specific. Either that, or Bill was being conservative. >> When I get a chance, I will experiment with removing it. > >Well, the DE Driver (DEC 21x4x) has (relevant lines marked ***): > > [SNIP: code showing de driver does not reset chip] I've now read the 21143 chip manual from Intel. What the de driver does is illegal (the transmitter must be idle when the threshold is changed). I don't know if it works in practice, the de driver didn't work well for me. What the dc driver does is overkill. I will implement some changes, based on the documentation, and see what happens. Of course, Bill, if you have direct experience that contradicts the documentation (as if I've never seen incorrect doco...) then I'm all ears. I also have a very limited range of test hardware. >I agree, that for chips that need to be completely re-initialized, the >default might be store-and-forward ... >There are so many DEC 21x4x clones, all slightly different, and it seems >that at least a few need the chip reset. There is already a convenient store-and-forward-only flag that is set for one of the supported chips. I propose that this flag be set on all hardware that cannot have the threshold changed without a reset. >> It hides the problem very well for me. I really can't see the tiniest >> of performance loss with store and forward. Maybe it's something that >> only shows up on benchmarks. > >Guess it will show up if you measure latencies (or your application is >doing lots of RPCs). But as soon as there is a cheap 100baseT switch in >the path to the destination, there will be store-and-forward at work ;-) Does anyone here actually measure these latencies? I know for a fact that nothing I've ever done would or could be affected by extra latencies that are as small as the ones we are discussing. Does anybody at all depend on the start-transmitting-before-DMA-completed feature we are discussing? Lastly, some people really want to keep the messages. Is hiding them behind bootverbose enough? Or do I have to add a flag/hint? No, I haven't looked at the new hint system, so I don't know if I should be afraid or not. :-) Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: dc driver and underruns (was: Strangeness with 4.0-S)
On Thursday, 13th July 2000, "Rodney W. Grimes" wrote: >>On Thu, 13 Jul 2000, Stephen McKay wrote: >> >>>Does anyone here actually measure these latencies? I know for a fact >>>that nothing I've ever done would or could be affected by extra latencies >>>that are as small as the ones we are discussing. Does anybody at all >>>depend on the start-transmitting-before-DMA-completed feature we are >>>discussing? >> >> I don't like the idea of removing that feature. Perhaps it should be a >> sysctl or ifconfig option, but it should definitely remain available. >> Those minute latencies are critical to those of us who use MPI for >> complex parallel calculations. > >I have to agree here. The store and forward adds an approximate >11uS (by theory under ideal conditions 1500bytes@132MB/s = 11uS, >practice actually makes this worse as typical PCI does something >less than 100MB/s or 15uS) to a 120uS packet time on the wire (again, >ideal, but here given that switches, and infact often cut-through >switches, are used for these types of things, ideal and practice >are very close.) > >I don't think these folks, nor myself, are wanting^H^H^H^H^H^H^Hilling >to give up 12.5%. OK. It seems that repairing the feature, rather than disabling it is the most popular option. Still, I am quite interested in finding anyone who actually measures these things, and is affected by them. These very same people might be able to trace why we get the underruns in the first place. I suspect an interaction between the ATA driver and VIA chipsets, because other than the network, that's all that is operating when I see the underruns. And my Celeron with a ZX chipset is immune. Back to the technical, for a moment. I have verified that stopping the transmitter on the 21143 is both sufficient and necessary to enable the thresholds to be set. I have code that works on my machine. I intend to commit it when I think it looks neat enough. Getting even more technical, it appears to me that the current driver instructs the 21143 to poll for transmit packets (ie a small DMA) every 80us even if there are none to be sent. I don't know what percentage of bus time this might be, or even how to calculate it (got some time Rod?) but it looks unnecessary to me. I think the transmitter could be turned off regularly. At the moment, the driver leaves it on all the time. And to the non technical: Do the messages go or stay? I've heard both sides. For most people they are just annoying fluff. For those who actually care about the latency, it might be informative, and thus too useful to be hidden behind bootverbose. Opinions? Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: dc driver and underruns (was: Strangeness with 4.0-S)
On Friday, 14th July 2000, Matthew Jacob wrote: > >> That theory is not correct, I have seen multiple Alpha machines reporting >> buffer underruns as well. No ATA disk in sight there.. > >This has been a reported feature of the tulip chip and alphas (de driver >usually) forever forever forever. And there's no guarantee that there is just one cause. If the dc driver with BX and ZX chipsets never has an underrun, and the 2 VIA chipsets I've tried always cause underruns, there might be something we can fix. Even if we never manage to fix it on Alphas. >It's not a bug, per se, IMO. In the i386 case, there's some sort of PCI bus starvation. Maybe we can fix it. Maybe not. We can at least try to categorise it. Maybe it's as simple as a BIOS option we should tweak. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: dc driver and underruns (was: Strangeness with 4.0-S)
On Friday, 14th July 2000, "Rodney W. Grimes" wrote: >> I suspect an interaction between the ATA driver and VIA chipsets, >> because other than the network, that's all that is operating when I see >> the underruns. And my Celeron with a ZX chipset is immune. > >I've seen them on just about everything, chipset doesn't seem to matter, >IDE or SCSI doesn't seem to matter. Well, maybe they are just a fact of life. But using just my vague knowledge of how PCI works, it doesn't look inevitable to me. So I see bugs. :-) >> Getting even more technical, it appears to me that the current driver >> instructs the 21143 to poll for transmit packets (ie a small DMA) >> every 80us even if there are none to be sent. I don't know what percentage >> of bus time this might be, or even how to calculate it (got some time Rod?) > >I'll have to look at that. If it is a simple 32 bit read every 80uS >thats something like .1515% of the PCI bandwidth, something that shouldn't >matter much. (I assumed a simple 4 cycle PCI operation). Just how big >is this DMA operation every 80uS? I believe it is just one 32 bit read. But I don't understand that aspect of the hardware very well yet. I also suspect that this polling adds to the latency, but again, I haven't got to the end of that either. Sometimes other things can distract you from even the most interesting technical matter. :-) Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Ugly, slow shutdown
I'm off in a few days for a couple months of tourism in Europe (no, no need for sympathy!), so I'm dumping these couple ideas on you and running. I think shutdown time has gotten uglier and slower than it needs to be. I want to apply these patches (well, at least the first one) before I escape radar range. Your job is to not object much. :-) Patch 1 replaces: Waiting (max 60 seconds) for system process `bufdaemon' to stop...stopped with Stopping bufdaemon Also: syncing disks... 10 10 3 done returns to the traditional syncing disks... 10 10 3 done Patch 2 is smaller and possibly controversial. Normally bufdaemon and syncer are sleeping when they are told to suspend. This delays shutdown by a few boring seconds. With this patch, it is zippier. I expect people to complain about this shortcut, but every sleeping process should expect to be woken for no reason at all. Basic kernel premise. I've been running these patches on a 4.x machine for a while now. No problems except I am now surprised by the slow and ugly shutdown of unpatched machines. :-) I apologise that I've not tested these against -current. That's the bit that I've skipped because I'm out of time. There should be no difference between 4.x and -current in this area though. These patches will apply cleanly against both. Cheers, Stephen. Patch 1: Index: kern_shutdown.c === RCS file: /cvs/src/sys/kern/kern_shutdown.c,v retrieving revision 1.76 diff -u -r1.76 kern_shutdown.c --- kern_shutdown.c 2000/07/04 11:25:22 1.76 +++ kern_shutdown.c 2000/07/06 15:02:21 @@ -247,7 +247,6 @@ sync(&proc0, NULL); DELAY(5 * iter); } - printf("\n"); /* * Count only busy local buffers to prevent forcing * a fsck if we're just a client of a wedged NFS server @@ -261,6 +260,8 @@ bp->b_vp->v_mount, mnt_list); continue; } + if (nbusy == 0) + printf("\n"); nbusy++; #if defined(SHOW_BUSYBUFS) || defined(DIAGNOSTIC) printf( @@ -593,12 +594,11 @@ return; p = (struct proc *)arg; - printf("Waiting (max %d seconds) for system process `%s' to stop...", - kproc_shutdown_wait, p->p_comm); + printf("Stopping %s", p->p_comm); error = suspend_kproc(p, kproc_shutdown_wait * hz); if (error == EWOULDBLOCK) - printf("timed out\n"); + printf(": timed out\n"); else - printf("stopped\n"); + printf("\n"); } Patch 2: Index: kern_kthread.c === RCS file: /cvs/src/sys/kern/kern_kthread.c,v retrieving revision 1.5 diff -u -r1.5 kern_kthread.c --- kern_kthread.c 2000/01/10 08:00:58 1.5 +++ kern_kthread.c 2000/08/05 15:32:06 @@ -116,6 +116,12 @@ */ if ((p->p_flag & P_SYSTEM) == 0) return (EINVAL); + /* +* The target process is probably just snoozing. Wake it up so +* that it will notice that it should suspend itself. +*/ + if (p->p_wchan != NULL) + wakeup(p->p_wchan); SIGADDSET(p->p_siglist, SIGSTOP); return tsleep((caddr_t)&p->p_siglist, PPAUSE, "suspkp", timo); } TheEnd To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Ugly, slow shutdown
> * Mike Smith <[EMAIL PROTECTED]> [000807 01:25] wrote: > > > * Stephen McKay <[EMAIL PROTECTED]> [000805 08:49] wrote: > > > > > > > > ... every sleeping process should expect > > > > to be woken for no reason at all. Basic kernel premise. > > > > > > You better bet it's controversial, this isn't "Basic kernel premise" > > > > Actually, that depends. It is definitely poor programming practice to > > not check the condition for which you slept on wakeup. > > Stephen's patches didn't give them that option, the syncer could be > in some other part of vfs that doesn't expect to be woken up, perhaps > in uniterruptable sleep... perhaps waiting for a DMA transfer? > > How does one check if the data filled into a buffer is actually from > the driver and not just stale? The time honoured standard is: raise cpu priority while (we do not have exclusive use of some item) { set some sort of "I want this item" flag (optional) sleep on a variable related to the item } use the item/data we waited for lower cpu priority A typical example from vfs_subr.c: s = splbio(); while (vp->v_numoutput) { vp->v_flag |= VBWAIT; error = tsleep((caddr_t)&vp->v_numoutput, slpflag | (PRIBIO + 1), "vinvlbuf", slptimeo); if (error) { splx(s); return (error); } } ... the code plays a little with vp here ... splx(s); A simpler example from swap_pager.c: s = splbio(); while ((bp->b_flags & B_DONE) == 0) { tsleep(bp, PVM, "swwrt", 0); } ... code uses bp here ... splx(s); Both of these examples are safe from side effects due to waking up early. This is how all such code should be. To do otherwise is to introduce possible race conditions. At your prompting, though, I've looked at more code and have found an example that violates this principle. I assume it is a bug waiting to bite us. In the 4.1.0 source (sorry, that's all I have on operational computers at this moment) line 581 of vfs_bio.c sleeps without looping. It would seem that Alfred's assertion of lurking danger is correct. This stuff should be fixed. > > > *boom* *crash* *ow* :) > > > > Doctor: So don't do that. > > > > In this case, the relevant processes just need to learn to check whether > > they've been woken in order to die. > > No, they need to signify that it's safe to wake them up early. When I return to the land of FreeBSD I'll offer a speedup that does not wake processes in arbitrary places (to avoid tickling lurking bugs). To do this I would make processes that want to use the suspension mechanism call a routine in kern_kthread.c for their just-loafing-about sleep. Then that module will have enough information to do the job quickly. And back to the simpler bit (the bike shed bit). Does everyone else actually *like* the verbose messages currently used? And the gratuitous extra newline in the "syncing..." message? Stephen. PS My main machine has blown its power supply. Contact with me will be patchy. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Ugly, slow shutdown
Well, I've failed in my main objective (to deuglify the shutdown messages), but an interesting debate has resulted instead, so I can't feel too bad. I did a little research to support my position on sleep/wakeup, and here's the best I have. This is pretty long, and unlikely to shake your world view, so those of you with drooping eyelids can just head over to slashdot, or something. :-) Some pseudo code from "The Design of the Unix Operating System", by Maurice Bach, page 33 shows how sleep() is used: while (condition is true) sleep (event: the condition becomes false); set condition true; and the next page shows how wakeup() is used: set condition false; wakeup (event: the condition is false); In the description, it says `Thus, the "while-sleep" loop insures that at most one process can gain access to a resource.' Not the most convincing evidence, but on the other hand, he does not mention the idea of *not* protecting against sudden wakeup. >From "Writing a Unix Device Driver", by Egan and Teixeira, on page 92 we find It is not uncommon for several processes to sleep on the same channel. They may be competing for the same resource, or they may be waiting for different reasons that have been associated with the same channel value. In this situation a single wakeup call on the common channel will cause all the sleeping processes to become executable; ... A driver routine must not assume that it can proceed after a return from a sleep call. It should check to see whether the event it was waiting for has actually occurred; if it has not it should sleep again, and repeat this cycle until the awaited event has actually occurred. The book is oriented rather towards I/O, so perhaps not all possible uses of routines are covered. But again, no mention of *not* using a while loop. Quite the opposite. Also "Magic Garden Explained" points out that you really want to sleep on an "event", but all you have is the address of some data. So, you often have multiple semantically different events represented by the same integer wakeup channel. A good reason to program defensively, I think. But the best evidence is from kern_synch.c from 4.2 BSD, line 98, in the header comment of the sleep() routine: * Callers of this routine must be prepared for * premature return, and check that the reason for * sleeping has gone away. That comment on sleep() is present from 4.0 BSD up to and including 4.3 tahoe, but disappears in 4.3 reno, when the 4.4 style tsleep() was introduced. After a bit of searching through the PUPS archive, I see it is even present in Edition 6, character for character, in a file called slp.c. Well, I knew I wasn't a senile old fart yet, and Kirk's BSD CD compendium and the PUPS archive show that I remember some things correctly still. For a considerable portion of Unix history, sleep() could return for no good reason at all, and was documented to do so (if only in the source code). Now, how does this relate to the current day? Nobody in the BSD world uses plain sleep() any more. Once tsleep() appeared, the rules seem to have changed. Perhaps some people had gotten away with ignoring the dire warnings in the sleep() code, and decided that unexpected wakeups weren't such a useful part of the API. I hope Kirk or other BSD veterans can be coaxed into offering an opinion. I'd offer at least one beer for this purpose. :-) Regardless of the history of it all, FreeBSD is full of places where unexpected wakeups can stuff you right up. Should we regard tsleep() like the older sleep() call, as suspect, and program defensively? Should we be pragmatic, admit "We've gotten away with it so far", and document the "no sudden wakeups" behaviour? I quite like the general principle outlined in one of the earlier replies, that a while loop can be shown to be correct through a local code reading, but a simple conditional must be verified by reading all the rest of the code. That's close to the same argument I use against global variables. Their use is too hard to verify as correct. In short, I'd like to see all cases where tsleep() is not carefully used in a loop repaired. Practically speaking, though, I can't see that happening, especially if we have any major players against the idea (DG for example). Given that, I'd like as a minimum a bit more of the history of sleep() in the tsleep() manual page, and a discussion of when a while-loop protected tsleep() is mandatory, and when it is optional. Some sort of pronouncement against issuing wakeup() calls against arbitrary addresses would help too. I would do that right now, except I'm escaping computing for a few months. Almost heresy nowadays, I suppose. And I won't be the first in line for a brain implanted net connection either. ;-) Stephen. PS By the time you read this, I've probably unsubscr
Fix for broken "burncd msinfo" PR#27593
A number of people have complained that "burncd msinfo" returns the wrong value when there are already multiple sessions on a CD. This is true, and is bug bin/27593. Since I burn a lot of multisession CDs, and have been working out the mkisofs -C values by hand with the help of "cdcontrol info", I thought now would be a good time to fix this bug. Unfortunately, I've found that burncd won't work with SCSI burners, and the only ATAPI burner I have is at work, and well, it's Christmas and all that. So this is completely untested, though I believe it should work. I hope this can make it into 4.5. Stephen. PS How much work would it be to add the CDRIO* ioctls to the SCSI cd driver? Index: burncd.c === RCS file: /cvs/src/usr.sbin/burncd/burncd.c,v retrieving revision 1.19 diff -u -r1.19 burncd.c --- burncd.c2001/12/24 03:20:10 1.19 +++ burncd.c2001/12/25 13:45:48 @@ -149,10 +149,14 @@ break; } if (!strcasecmp(argv[arg], "msinfo")) { + struct ioc_toc_header header; struct ioc_read_toc_single_entry entry; + if (ioctl(fd, CDIOREADTOCHEADER, &header) < 0) + err(EX_IOERR, "ioctl(CDIOREADTOCHEADER)"); bzero(&entry, sizeof(struct ioc_read_toc_single_entry)); entry.address_format = CD_LBA_FORMAT; + entry.track = header.ending_track; if (ioctl(fd, CDIOREADTOCENTRY, &entry) < 0) err(EX_IOERR, "ioctl(CDIOREADTOCENTRY)"); if (ioctl(fd, CDRIOCNEXTWRITEABLEADDR, &addr) < 0) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Another tweak to "burncd msinfo"
Now that "burncd msinfo" returns the correct values I noticed another small problem: it displays the result on stderr instead of stdout. Since very few people (nobody?) would be using this option yet because of the previous problem, it seems like nobody would be adversely affected by changing the output to stdout. Also, removing the whitespace in the output would help script writers. Can I commit the obvious patch? Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Another tweak to "burncd msinfo"
On Saturday, 5th January 2002, Søren Schmidt wrote: >It seems Stephen McKay wrote: >> Now that "burncd msinfo" returns the correct values I noticed another small >> problem: it displays the result on stderr instead of stdout. > >Hmm, that was intentional... Could you explain why? The most obvious practical use would be: $ mkisofs -r -C `burncd msinfo` -M /dev/acd0c -o new.iso goodies Writing to stderr means this doesn't work, and you have to add 2>&1 to it. Also the white space means you have to use extra quoting. >> Can I commit the obvious patch? > >Could you just hang on for now, since I'm doing large changes to >burncd just now in order to support other things, and keeping >everybody changes to the stock sources is not making things >easier... Are these changes intended for 4.5? I'm hoping the small change I proposed would be accepted into 4.5, before anybody starts using "burncd msinfo" in practice. I think this is sensible, even if a much improved burncd is scheduled for 4.6. Regardless of this, I do not intend to commit any unwelcome changes. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Another tweak to "burncd msinfo"
On Saturday, 5th January 2002, Søren Schmidt wrote: >It seems Stephen McKay wrote: >> >> Are these changes intended for 4.5? I'm hoping the small change I >> proposed would be accepted into 4.5, before anybody starts using >> "burncd msinfo" in practice. I think this is sensible, even if >> a much improved burncd is scheduled for 4.6. > >You should ask permission from the release engineer to commit it >to 4.5, but it really should be committed to -current first. Of course! But given how simple the change is, just a couple of days in -current would be sufficient testing. I am asking your approval to commit to -current, then I'll ask the REs about -stable. Does this mean you've decided that it is a beneficial change and won't intefere with your other work? Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Another tweak to "burncd msinfo"
On Saturday, 5th January 2002, Søren Schmidt wrote: >I forgot to say that I already committed the change to current... :-) I try to keep up with -current, but that's too current for me! I'll hassle the REs tomorrow about permission to merge. Thanks, Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Whatever happened to CTM?
On Tuesday, 20th March 2001, Ulf Zimmermann wrote: >On Mon, Mar 19, 2001 at 04:53:33PM -0800, John Baldwin wrote: >> >> On 20-Mar-01 Michael C . Wu wrote: >> > For all connections greater than 9600baud modems, we recommend >> > using CVSup to get src-all and ports-all updated. At the worst case, >> > be able to CVSup a ports-all collection within an hour, with heavy >> > packet loss and low bandwidth. >> > >> > i.e. CTM sucks, don't use it. :) On the contrary, I prefer CTM over CVSup, even on a fast connection (which I don't currently have). On a slow or intermittent connection, CTM beats CVSup by a large margin. >> cvsup is not available via e-mail for those who may only have e-mail access >> for one reason or another. Firewalls make CTM style delivery essential. (No, Stefan, I don't like your tunneling idea. :-) >I have been hosting the machine which ran ctm, And many thanks indeed for your service! >unfortunatly my provider >cut me off and I just got some access back, but not for the location >the ctm machine is located at. > >At this time I do not know yet when it will have access again. Surely FreeBSD Inc (or whatever it is that owns the freebsd.org machines) could spring for a box. Assuming Ulf is still keen, it shouldn't be too hard for him to remote administer it. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Whatever happened to CTM?
On Thursday, 22nd March 2001, Bruce Evans wrote: >On Wed, 21 Mar 2001, Stephen McKay wrote: >> On the contrary, I prefer CTM over CVSup, even on a fast connection (which >> I don't currently have). On a slow or intermittent connection, CTM beats >> CVSup by a large margin. > >I'm not sure about that. CTM may be faster, but it works less >automatically, especially when it breaks, and it breaks often, at both >the server and client levels (mainly downtime problems for the server >and disk-full problems for the client. I used to use it until the >server broke one time too many last year. CTM's advantages outweigh the disadvantages for me. I don't run out of disk space(*), and the server failures have been rare. Certainly, the reliability of CTM delivery exceeded the reliability of all of the M$ systems the guys in the neighbouring cubicles managed at my previous employer. Until now, of course. What we need now is someone to supply hardware and some connectivity. I still think CTM has sufficient advantages to justify its continued existence. I think the project should fund it. Stephen. (*) The tangle you get in after ctm croaks from lack of disk space were supposed to have been fixed. I don't think they have been. It shouldn't be too difficult though. All those md5 checksums make repairs trivial to automate, in theory. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Implications of stdio changes (was Re: cvs commit: src/include stdio.h src/lib/libc Makefilesrc/lib)
On Tuesday, 14th August 2001, Daniel Eischen wrote: >> > So do we allow FILE to be extended only after bumping the library >> > version once (after 5.0-release)? And thereafter all extensions to >> > FILE do not need a version bump? >> >> We've already bumped libc for 5.x. Assuming this works ok, we shouldn't need >> any further bumps for extending FILE. > >True. I guess the real problem is the other libraries that reference >stdin, stdout, stderr. These need to be rebuilt with the new stdio.h >and libc in order to avoid any impact from future FILE changes. I might sound like the harbinger of doom, but you have to bump the major number on every library that uses stdio to solve the "FILE has changed size" problem. It's the same sort of problem that changing errno caused. That was "solved" by the switch to elf, which caused global recompilation. People are hoping to do this by just waiting. Eventually most libraries will experience a major version bump. Similarly, most useful programs will be recompiled (either against bumped libraries, or recompiled old ones). But some programs will not be recompiled, and will fail in mysterious ways. I often use really old binaries, so odds are it will happen to me. :-) To prevent old binaries from going bad, the libraries they link to must use the old version of stdio. Definite ideas of the offset in __sF of stdout and stderr are embeded in both the old programs, and the old libraries (and of course, the old version of stdio). If you recompile the libraries against the new stdio, you break the old binaries. The solution is to not do that. In short, when FILE changes size (and hence __sF offsets change), then every consumer(*) of stdio must be bumped. The recent __stdinp (and friends) addition prevents this problem happening again in the future, but does not solve the current problem of old binaries and old libraries knowing the internals of stdio. Stephen. (*) OK, technically only uses of "stdout" and "stderr" variables screw up when FILE changes size. Uses of macros (like getc variants that are sometimes macros) will screw up if offsets change, but that's easier to avoid. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
A tiny Perl bug?
I was trying to get FreeBSD 4.2-BETA to compile under FreeBSD 3.4 when I found that the use of the new setresgid() and setresuid() system calls were causing the perl5 compile to fail. I got around this using NOPERL=yup but while investigating I noticed an apparent bug in the use of setresgid() and propose this patch: Index: mg.c === RCS file: /cvs/src/contrib/perl5/mg.c,v retrieving revision 1.1.1.4 diff -u -r1.1.1.4 mg.c --- mg.c2000/08/20 08:42:14 1.1.1.4 +++ mg.c2000/11/22 12:01:32 @@ -1926,7 +1926,7 @@ (void)setregid((Gid_t)PL_gid, (Gid_t)-1); #else #ifdef HAS_SETRESGID - (void)setresgid((Gid_t)PL_gid, (Gid_t)-1, (Gid_t) 1); + (void)setresgid((Gid_t)PL_gid, (Gid_t)-1, (Gid_t)-1); #else if (PL_gid == PL_egid) /* special case $( = $) */ (void)PerlProc_setgid(PL_gid); I assume this was just a typo. I can't think of any reason to try to set the saved uid to daemon. I'd whip in and commit this myself, but I'm sure there are "vendor branch considerations", and I've never found out what's involved with that. And piggybacking a slightly wider issue: The cross-tools section of Makefile.inc1 is supposed to address the use of new system calls and such in build tools, right? Can we forget about the old "try to use the new syscall and do something else if it isn't there" code? And all we need to do to fix my migration problem is to MFC marcel's miniperl cross-build fix? Right? Otherwise I have all this blather I was going to say about using fancy new syscalls in perl just to emulate old syscalls we already have, and the way that makes upgrading harder. But I don't have to go on about that, it seems. :-) Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Is compatibility for old aout binaries broken?
On Saturday, 16th December 2000, "Donald J . Maddox" wrote: >The other day, on a whim, I decided to try running an old binary >of SimCity (the same one found in the 'commerce' directory on >many FBSD cds), and it failed in a odd way... You and I may be the only people in the world that run old binaries. This has been broken for new users for some time. :-( Those of us upgrading from source have been immune to this problem, because we retain the old a.out ld.so binary. >/usr/libexec/ld.so: Undefined symbol "___error" called from sim:/usr/X11R6/lib >/aout/libX11.so.6.1 at 0x20160644 When errno became a function that returns a pointer (previously it was a simple integer variable), recompiled libraries became incompatable with old binaries. So, I hacked the a.out loader (ld.so). The fix was in 3.0. Well, Nate called it a horrible hack, so maybe I should say "the hack was in 3.0". >Am I overlooking something obvious here, or is something actually >broken with respect to running old aout binaries? I found that rtld-aout won't compile. That's kinda broken. (It's probably something simple. Looks like the a.out version of a pic library just isn't around any more). I'll try harder later. What's certain is that it isn't compiled by default. I poked about with my old FreeBSD CD collection and found that version 3.0 through 3.2 have a fully functioning (fully hack enabled) ld.so, but an older binary has been substituted in 3.3 and onward, including 4.0 and 4.1, and most likely 4.2 also. I can only guess that some anonymous release engineer (nobody we know :-) picked the wrong CD at some point to get the master copy of ld.so once it stopped compiling. (Or at least stopped being easily compiled.) Ideally, rtld-aout would be compiled fresh for every release. Until then, you can repair your system by retrieving ld.so from a 3.3 CD (in the compat22 section), or from a 3.2 live filesystem CD. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Is compatibility for old aout binaries broken?
On Sunday, 17th December 2000, "Donald J . Maddox" wrote: >Under the circumstances, it seems silly to have aout conpat >bits installed at all, seeing as how they cannot work. Old programs that don't depend on recompiled libraries are fine. I can't guess at the percentages though. Also, nearly everybody has recompiled for elf, where this problem never occurred. >Like you, I normally upgrade from source -- This box has >been -current ever since 2.0.5 or so was -current, but I >had to reinstall from scratch a while back by installing >4.2-RELEASE and then cvsupping back to -current, so I >guess I lost my working aout ld.so in the process. Bummer :( I expected some build tool expert to say "Just compile with these options". But they haven't. So I'll see if the bits have rotted, or whether we can keep building ld.so instead of just including an age old binary. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Is compatibility for old aout binaries broken?
On Monday, 18th December 2000, "Donald J . Maddox" wrote: >On Mon, Dec 18, 2000 at 04:41:17PM +1000, Stephen McKay wrote: >> >> I expected some build tool expert to say "Just compile with these >> options". But they haven't. So I'll see if the bits have rotted, >> or whether we can keep building ld.so instead of just including >> an age old binary. >Well, if you do manage to uncover the lost magic, please let me know :) It's getting a little more magic every day to generate a.out stuff, but not all that bad. Basically I built lib/csu/i386, gnu/lib/libgcc, lib/libc and libexec/rtld-aout, in order, with these settings: NOMAN=yup DESTDIR="" OBJFORMAT=aout MAKEOBJDIRPREFIX=/usr/obj/aout In each directory, I used make obj, make, make install. (By the way, there are a lot of twisty little passages in /usr/share/mk. One of them required me to add DESTDIR="", which should be a NOP.) The generated ld.so has bloated a bit :-) but works fine. So we could in principle build ld.so for every release. It's just a question of whether we should. I think we should. But it might be just as easy to copy it off the 3.3 CD every time. It's dead end stuff after all. Does the release engineer have an opinion? Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Is compatibility for old aout binaries broken?
On Tuesday, 19th December 2000, Stephen McKay wrote: >But it might be just as easy to copy it off the 3.3 CD every time. Oops! As I wrote earlier, 3.3 and onward have the broken ld.so. Good copies are found on 3.0 though to 3.2. Sorry for veering off the road there. :-) Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Is compatibility for old aout binaries broken?
On Monday, 18th December 2000, Jordan Hubbard wrote: >> The generated ld.so has bloated a bit :-) but works fine. So we could >> in principle build ld.so for every release. It's just a question of >> whether we should. I think we should. But it might be just as easy >> to copy it off the 3.3 CD every time. It's dead end stuff after all. >> >> Does the release engineer have an opinion? > >If it's just for the compat3x distribution, I say check it into that >part of lib/compat and be done with it. Uudecoding it each time is a >lot easier than building it. Or are we talking about ld.so in some >different context? I hadn't noticed all the uuencoded things in lib/compat before. This is obviously the way to fix it. By the way, it's the compat22 distribution that needs fixing, and, as previously noted, it's the 3.2 CD that has the last fully working ld.so. I'll get onto committing a fix. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: No cable modems??
On Tuesday, 19th December 2000, "Donald J . Maddox" wrote: >Why are you (or your ISP) refusing to accept mail from people >with cable modems? Enquiring minds want to know... ;-) > - Transcript of session follows - >... while talking to frmug.org.: MAIL From:<[EMAIL PROTECTED]> ><<< 550 no cable modems here >554 5.0.0 [EMAIL PROTECTED] Service unavailable It's a spam reduction move. I'm surprised hub.freebsd.org accepts your mail! You should funnel your mail through your ISP's central mail hub. Followups to -chat, I think. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Is compatibility for old aout binaries broken?
On Wednesday, 20th December 2000, "David O'Brien" wrote: >On Mon, Dec 18, 2000 at 02:58:16AM +1000, Stephen McKay wrote: >> This has been broken for new users for some time. :-( Those of us >> upgrading from source have been immune to this problem, because we >> retain the old a.out ld.so binary. >> >> >/usr/libexec/ld.so: Undefined symbol "___error" called from >> >sim:/usr/X11R6/lib /aout/libX11.so.6.1 at 0x20160644 >> >> When errno became a function that returns a pointer (previously it was >> a simple integer variable), recompiled libraries became incompatable with >> old binaries. So, I hacked the a.out loader (ld.so). The fix was in 3.0. >> Well, Nate called it a horrible hack, so maybe I should say "the hack was >> in 3.0". > >src/lib/libc/sys/__error.c suggests this was the case for 2.2.7+. No, you want rev 1.10 of sys/sys/errno.h. That was when it affected all a.out binaries. Until then it was just threaded binaries, a vanishingly small proportion. Rev 1.10 was in 3.0. Rev 1.5 was in the 2.2.x releases. >What is out of sync is the X11 a.out libs. They are probably built on a >2.2.7 or 2.2.8 box, thus they refer to `___error' vs. `errno'. These >libs are wrong for the SimCity binary. They are a.out yes, but not >proper for compat20 use. Since SimCity needs `libgcc.so.261', I'll >assume it was built that long ago. Correcting slightly for your slightly off assumption: The X11 libs were probably built on a 3.x box. Their problem is that being newer than libc.so.2.2 (or was it libc.so.3.0) they use ___error but libc does not supply it. My patches to rtld-aout (that first appeared in FreeBSD 3.0) supply ___error in this case. This is the only full fix for this situation. >The problem isn't as much ld.so, as it should match the libc.so, et.al. >you are using from the compat2[01] dist (needed to satisfy ``ldd >lib/SimCity/res/sim''). And `ld.so' and the shared libs would be >consistent on the system the a.out program was built on. There was an enormous thread in -current (I think) at the time (mid 1998). The end result was that the ld.so hack was the only solution other than mandating a major bump to every library in existence. Nobody liked either of those solutions :-) but I put the ld.so hack in and the problem disappeared. Emphasis again: the workaround ld.so was only found in 3.0 and onward, so just using a 2.2.x ld.so isn't enough. >What I would feel most comfortable with, is doing a MFC to RELENG_2_2 of >the rtld-aout changes since then, building a new `ld.so' and putting that >in the compat2? dists. Problem is I don't have access to a 2.2-STABLE >box. I have built a binary on 4.2-RELEASE. I think I prefer that because any security fixes in libc (or whatever) will be reflected in the resulting ld.so. In fact, I think we should build ld.so from source until such time as a.out building capability is removed (5.0 perhaps). On the other hand, merging back to 2.2.x and rebuilding should provide a working (and hack enabled) ld.so that has no more problems than the old binaries it is supporting. >> I poked about with my old FreeBSD CD collection and found that >> version 3.0 through 3.2 have a fully functioning (fully hack enabled) >> ld.so, but an older binary has been substituted in 3.3 and onward, >> including 4.0 and 4.1, and most likely 4.2 also. > >Are you sure? src/lib/compat/compat2[012]/ld.so.gz.uu are all at >rev 1.1. So there has been no change to them over the lifetime of their >existence. All three are identical -- having the same MD5 checksum. >Well, looking at the release tags compat22/ld.so was in 3.2. >compat2[01]/ld.so was added for 3.3. This very fact is bothering me a lot. Get out your 3.2 disks and verify that they do not match these uuencoded binaries. Check the 3.0 and 3.1 disk 2 (live file system) and see that they don't match them either. >> I can only guess that some anonymous release engineer (nobody we know :-) >> picked the wrong CD at some point to get the master copy of ld.so once >> it stopped compiling. (Or at least stopped being easily compiled.) > >Not quite. I seem to remember that JKH was makeing a tarball of a.out >libs from what ever was on his box at the time (thus probably the last >a.out ld.so just before E-day on 3-CURRENT). Something like this must have happened up to and including the 3.2 release. >When I committed the >compat2? bits, I took ld.so from a 2.2.x release as this is the compat2? >dist, not compat3.aout dist. Which is what you're suggesting should have >been done. You missed the fact that fixes were added to ld.so after those releases even though the purpose of ld.so is to run binaries that date fr
Re: Is compatibility for old aout binaries broken?
On Wednesday, 20th December 2000, "Donald J . Maddox" wrote: >> > Looks good. Can you install the XFree896-aoutlib port? You may have >> > seen were someone posted the a.out libs from 3.3.6 are known to not be >> > the the best to use for compatibility use. > >Interesting. After I installed the XFree86-aoutlibs port, SimCity >works fine for me (on an 8-bit display)... > >It didn't work with the X libs built by the port when aout libs >are requested, and it didn't work with the X libs from 3.3.6, but >it works with these. If the XFree896-aoutlib libraries are old enough, they will not call ___error. That is sufficient to solve your particular problem, but not to solve the general case. I'm now wondering if the reason that people don't like the XFree86 3.3.6 a.out libraries is the problem with ___error and the older ld.so supplied with recent FreeBSD releases. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Is compatibility for old aout binaries broken?
On Wednesday, 20th December 2000, "Donald J . Maddox" wrote: >On Wed, Dec 20, 2000 at 10:14:09AM -0800, David O'Brien wrote: >> On Wed, Dec 20, 2000 at 11:15:55PM +1000, Stephen McKay wrote: >> > Correcting slightly for your slightly off assumption: The X11 libs were >> > probably built on a 3.x box. Their problem is that being newer than >> > libc.so.2.2 (or was it libc.so.3.0) they use ___error but libc does not >> > supply it. My patches to rtld-aout (that first appeared in FreeBSD >> > 3.0) supply ___error in this case. This is the only full fix for this >> > situation. >> >> Why is not changing the XFree86-aoutlibs port to offer libs built on >> 2.2.x not the right fix? > >I was under the impression that this was already the case... The libs >in the XFree86-aoutlibs port ARE from 2.2.x. My problem was that I >was using libs built on 3.x. (I think I can save a lot of typing by replying to this message. I'm just about to leave town.) My whole point is that generating a.out binaries and libraries didn't stop the instant that 3.0 hit the streets. To support the mixture of old binary plus new library you need a hacked ld.so. We have to supply it somehow, or simply say we don't care about certain binaries dying with obscure error messages. This XFree86-aoutlibs vs libs built on 3.x example supports my theme. I can't reconcile your naming convention (ie compat22 bits originated on a 2.2.x box) with my version (compat22 is used to support 2.2.x binaries). I'm also not afraid that a binary generated on 4.2 would have hidden defects. I'm more worried that one generated on 2.2.x would have defects we've forgotten about. If you don't mind pausing the whole argument for about 4 days, I can rejoin. :-) Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Fixing a.out compatibility
I'll try to summarise the position so far: 1) Legacy a.out executable support is broken for a subset (size unknown) of such executables. 2) We can ignore this or repair this. 3) We can build a new binary or just look around on old 3.x CDs until we find one that works. 4) We can generate a working binary on 4.x or on 2.2.8-stable (after some fixing). 5) We can generate ld.so anew each release, or generate it (or find it) once and commit a binary. I don't think there's any doubt about point 1. All a.out executables that use libc.so.2.2 and another recompiled library will fail because of a missing routine (__error) required by the recompiled library and not supplied by libc or by executable or by the existing ld.so. All these executables come from the 2.2.x era or earlier. Those built in the 3.x era use libc 3.1 and don't have this problem. Urk... Actually, it's slightly more complicated than that since the libc.so.3.1 built on 2.2.6 (for example) didn't contain __error() but the one built on 2.2.7 did. (At least according to the cvs logs). I'm most annoyed that I can't find my 2.2.6 CDs. 2.2.5 had libc 3.0 (without __error) and 2.2.7 had libc 3.1 (with __error) but the cvs logs say that 2.2.6 should have had a different libc 3.1 (without __error). So, the exact "version" of version 3.1 of libc could be important. Yuck. We don't normally ignore things we can fix, so point 2 is resolved in favour of fixing this, right? We need to build a new binary since we (collectively) have forgotten where the working 3.0 through 3.2 binaries came from. :-( Can we, for example, prove that revision 1.57 made in into any release? It seems feasable to generate a new binary on a recent or an old patched FreeBSD version. The question is which is better. I think the newer the better. Otherwise, who is going to build the 2.2.8-stable box to make this one binary? I've already built a binary on 4.2-release that works. We disagree a bit over point 5. I think it is feasable and desirable to build ld.so at each release. If we don't build it for each release, how will fixes to rtld-aout and required libraries (eg libc) be incorporated? I say keep building it fresh until a.out builds are impossible. Or are you suggesting that each advance in 4.x and beyond be backported to 2.2.8-stable so that we can build one binary? So, where to from here? Despite all my arguments, I could just commit the binary I have to the lib/compat2* areas and leave it at that. Stephen. PS Thanks for all the "old_RELENG_2_2" etc tags now available in rtld-aout. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Fixing a.out compatibility
[Noted that you don't like being cc:'d, David. On the other hand, I like to be kept in the cc: list.] On Tuesday, 26th December 2000, "David O'Brien" wrote: >On Wed, Dec 27, 2000 at 02:01:24AM +1000, Stephen McKay wrote: >> I'll try to summarise the position so far: >> >> 1) Legacy a.out executable support is broken for a subset (size unknown) >> of such executables. > >Define "legacy". I have been speaking specifically about FreeBSD 2.2 >support. That just happens to a.out based. > >You seem to mean it to be any a.out binary. Not really. If I generate an a.out binary right now, it can't suffer from this problem, even though it uses ld.so when it runs. Only a certain set of old a.out binaries are affected. >From my standpoint only bits generated on a 2.2.x host can go into the >compat22 distribution. When compat1x was created (being the first it >gets to imply the intention of the compat dists) it gave the ability to >run FreeBSD 1.x binaries; not 2.0 a.out ones, not any binary after the >last 1.x release. Thus why I claim compat22 is *not* about being an >a.out compat dist, but one to properly run 2.2.x binaries on a later >version of FreeBSD. If 3.0 had been a.out based, there still would be a >compat22 dist. We almost completely agree, but... The only a.out binaries with problems come from that 2.2/2.1 era. To support them we need an ld.so from *after* that era. I can't see how you get around this. That working ld.so was in 3.0 and was certainly no generated on a 2.2.x host. I think your restriction on compat contents is a useful guideline, but to be broken when necessary. >Thus someone that still has access to a 2.2.8-stable box needs to merge >the changes in src/libexec/rtld-aout (in -current) to >src/gnu/usr.bin/ld{,/rtld} and build a new binary for inclusion in the >compat22 dist. I'll build one if I have to. I'm trying to avoid unnecessary work, since I expect there are few others bothered enough to fix this problem. >Note, when the bits were CVS repo copied into rtld-aout, all the tags >were stripped. I spent the time to add them all back to make the merge >easier for someone. Whoever does this should please CVSup before >starting. Could very well be me. But I would be patching the old location, surely? >> I don't think there's any doubt about point 1. All a.out executables that >> use libc.so.2.2 and another recompiled library will fail because of a >> missing routine (__error) required by the recompiled library and not >> supplied by libc or by executable or by the existing ld.so. > >Agreed, but "and another recompiled library", means this a.out >executable was not built on a 2.2.x host. Otherwise there would be no >way to have this inconsistency. This is the fundamental point of this problem. The executable was built on a 2.2.x or 2.1.x box and originally used libraries compiled then or earlier. The whole problem is the fact that libraries were recompiled later and did not change version numbers. There was no way to force external parties to update version numbers, and folks round here didn't feel like bumping all the FreeBSD library version numbers. This is why I keep the words "executable" and "library" separate. The library is newer than the executable, and this causes the executable to fail. This is the fact that I'm not at all sure that you understand. >Actually one problem is I put the 2.2.8 ld.so in the compat2[01] dist. >That was wrong of me. I can correct that. SimCity (the binary used as >an example) required me to install the comapt20 and compat21 dists. The >other problem is we don't have a compat2[01] XFree86 libs dist. We only >have an a.out one that is intended to cover all a.out binaries, and it >doesn't correctly. We can only install one ld.so. It has to cover all bases. Are you suggesting that each compat2x dist install a different ld.so? This is consistent with your claim that "compat2x bits come from 2.x", but not very useful in practice. Should I assume you meant to delete ld.so from all but one compatxx dist? >> 2.2.5 had libc 3.0 (without __error) and 2.2.7 had libc 3.1 (with >> __error) but the cvs logs say that 2.2.6 should have had a different >> libc 3.1 (without __error). So, the exact "version" of version 3.1 of >> libc could be important. Yuck. > >The compat22 dist used the 2.2.8 bits, so I don't see how it wasn't the >``exact "version" of version 3.1 of libc''. What I was going on about here is that important changes occurred to libraries without a version bump, and one such library was libc. It is making my attempt to describe the boundary of the problem very difficul
panic: vm_object_qcollapse(): object mismatch
Hardware: 486DX2/66 16Mb ram, aha1542CF, 2x1Gb SCSI disks Software: 4.0-current 1-2 days old, softupdates (vm_map.c is at rev 1.146, for example) I was running 'make -j5 buildworld'. It swaps like crazy when I do this. :-) Here's what gdb -k tells me: ... #9 0xf01425e0 in panic ( fmt=0xf0225c1f "vm_object_qcollapse(): object mismatch") at ../../kern/kern_shutdown.c:446 #10 0xf01e0772 in vm_object_qcollapse (object=0xf2f001d0) at ../../vm/vm_object.c:1011 #11 0xf01e08d6 in vm_object_collapse (object=0xf2f001d0) at ../../vm/vm_object.c:1102 #12 0xf01ddae2 in vm_map_copy_entry (src_map=0xf2f4aa00, dst_map=0xf2f4ad00, src_entry=0xf2ed0e10, dst_entry=0xf2f8edc0) at ../../vm/vm_map.c:2284 #13 0xf01ddd73 in vmspace_fork (vm1=0xf2f4aa00) at ../../vm/vm_map.c:2411 #14 0xf01da833 in vm_fork (p1=0xf2f7db20, p2=0xf2d751e0, flags=20) at ../../vm/vm_glue.c:231 #15 0xf013d4f0 in fork1 (p1=0xf2f7db20, flags=20) at ../../kern/kern_fork.c:447 #16 0xf013ce65 in fork (p=0xf2f7db20, uap=0xf3021f94) at ../../kern/kern_fork.c:99 #17 0xf01fe783 in syscall (frame={tf_es = 134807599, tf_ds = -272695249, tf_edi = 134750909, tf_esi = 134935201, tf_ebp = -272643652, tf_isp = -217964572, tf_ebx = 4, tf_edx = 672250004, tf_ecx = 19, tf_eax = 2, tf_trapno = 12, tf_err = 2, tf_eip = 671826564, tf_cs = 31, tf_eflags = 662, tf_esp = -272651296, tf_ss = 47}) at ../../i386/i386/trap.c:1100 #18 0xf01f4e9c in Xint0x80_syscall () ... (kgdb) p *p $1 = {pageq = {tqe_next = 0xf02c5240, tqe_prev = 0xf02e4e00}, hnext = 0x0, listq = {tqe_next = 0xf02e59d0, tqe_prev = 0xf2f69cc8}, object = 0xf2f69cb0, pindex = 30, phys_addr = 15065088, queue = 4, flags = 1, pc = 0, wire_count = 0, hold_count = 0, act_count = 27 '\e', busy = 0 '\000', valid = 255 'ÿ', dirty = 255 'ÿ'} (kgdb) p object $2 = (struct vm_object *) 0xf2f001d0 (kgdb) p *object $3 = {object_list = {tqe_next = 0xf2fdc2b8, tqe_prev = 0xf2f69c3c}, shadow_head = {tqh_first = 0x0, tqh_last = 0xf2f001d8}, shadow_list = { tqe_next = 0x0, tqe_prev = 0xf2f69cb8}, memq = {tqh_first = 0xf02dbcb0, tqh_last = 0xf02cc86c}, generation = 11690, type = OBJT_DEFAULT, size = 32, ref_count = 2, shadow_count = 0, pg_color = 0, hash_rand = -136756254, flags = 8576, paging_in_progress = 0, behavior = 0, resident_page_count = 6, cache_count = 0, wire_count = 0, backing_object = 0xf2f69cb0, backing_object_offset = 0x, last_read = 0, pager_object_list = {tqe_next = 0xf2f69000, tqe_prev = 0xf0252f10}, handle = 0x0, un_pager = {vnp = { vnp_size = 0x}, devp = {devp_pglist = {tqh_first = 0x0, tqh_last = 0x0}}, swp = {swp_bcount = 0}}} (kgdb) p *(p->object) $4 = {object_list = {tqe_next = 0xf2f915e4, tqe_prev = 0xf30fd0e8}, shadow_head = {tqh_first = 0xf2f001d0, tqh_last = 0xf2f001e0}, shadow_list = {tqe_next = 0x0, tqe_prev = 0xf30fef04}, memq = { tqh_first = 0xf02e7170, tqh_last = 0xf02cff5c}, generation = 10219, type = OBJT_SWAP, size = 32, ref_count = 3, shadow_count = 1, pg_color = 0, hash_rand = -136000830, flags = 384, paging_in_progress = 0, behavior = 0, resident_page_count = 4, cache_count = 1, wire_count = 0, backing_object = 0x0, backing_object_offset = 0x, last_read = 29, pager_object_list = {tqe_next = 0xf30fad24, tqe_prev = 0xf30f0814}, handle = 0x0, un_pager = {vnp = { vnp_size = 0x0001}, devp = {devp_pglist = {tqh_first = 0x1, tqh_last = 0x0}}, swp = {swp_bcount = 1}}} I'll keep this dump around. What other details do people want? I'm not likely to even get to look at this let alone solve it. Bummer. Stephen. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: Possible fix for rc.conf
On Sunday, 21st March 1999, Richard Wackerbarth wrote: >Why do we need to have ANY of the file inclusion in /etc/defaults/rc.conf? >Shouldn't that file simply be definitions of variables? >IMHO, the "logic" should be in "rc" itself. Yeah! What he said! Having code in rc.conf sucks. If there is no logic, there can be no recursion. If you are going to mix code into rc.conf you may as well just suck it back into /etc/rc and get rid of it entirely. (*) Stephen. (*) Which is silly, of course. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Fatal trap 1: privileged instruction fault while in kernel mode
I've just got what seems an unlikely panic. How could I get a privileged instruction fault while in kernel mode? This is from a week old 4.0-current kernel on a 16Mb 486. It has an AHA1542CF a slow SCSI-1 disk, and a rebadged TDC4200 (2GB QIC). I run soft updates but nothing else fancy. I was doing a make buildworld, and rewinding a tape at the time. It seemed like the panic occurred when the tape stopped, but I wasn't actually watching at the time. The fatal instruction is: 0xc016abc9 :movl $0xc023355c,0xffdc(%ebp) which looks pretty ordinary. Can this be a software bug? Or has my hardware gone funny? It has done a good number of make worlds in the last few months with only the normal (software) troubles you expect from -current. Stephen. Here are the gory bits, in case anyone can offer any hints: IdlePTD 2834432 initial pcb at 248774 panicstr: from debugger panic messages: --- Fatal trap 1: privileged instruction fault while in kernel mode instruction pointer = 0x8:0xc016abc9 stack pointer = 0x10:0xc30a6c20 frame pointer = 0x10:0xc30a6c54 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12245 (sh) interrupt mask = bio panic: from debugger panic: from debugger dumping to dev 30401, offset 163840 dump 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 --- #0 boot (howto=260) at ../../kern/kern_shutdown.c:287 287 dumppcb.pcb_cr3 = rcr3(); (kgdb) where #0 boot (howto=260) at ../../kern/kern_shutdown.c:287 #1 0xc0148095 in panic (fmt=0xc021bda8 "from debugger") at ../../kern/kern_shutdown.c:448 #2 0xc0129575 in db_panic (addr=-1072256055, have_addr=0, count=-1, modif=0xc30a6acc "") at ../../ddb/db_command.c:432 #3 0xc0129515 in db_command (last_cmdp=0xc023558c, cmd_table=0xc02353ec, aux_cmd_tablep=0xc0245f8c) at ../../ddb/db_command.c:332 #4 0xc01295da in db_command_loop () at ../../ddb/db_command.c:454 #5 0xc012b95b in db_trap (type=1, code=0) at ../../ddb/db_trap.c:71 #6 0xc01f9bea in kdb_trap (type=1, code=0, regs=0xc30a6be4) at ../../i386/i386/db_interface.c:157 #7 0xc0203a00 in trap_fatal (frame=0xc30a6be4, eva=0) at ../../i386/i386/trap.c:938 #8 0xc02034b0 in trap (frame={tf_es = 16, tf_ds = -1023541232, tf_edi = -1023478144, tf_esi = 0, tf_ebp = -1022727084, tf_isp = -1022727156, tf_ebx = -1024550784, tf_edx = 12245, tf_ecx = -1023478144, tf_eax = 0, tf_trapno = 1, tf_err = 0, tf_eip = -1072256055, tf_cs = -1072300024, tf_eflags = 66194, tf_esp = -1024550784, tf_ss = -1024177152}) at ../../i386/i386/trap.c:586 #9 0xc016abc9 in vclean (vp=0xc2ee9880, flags=8, p=0xc2fef680) at vnode_if.h:835 #10 0xc016adb7 in vgonel (vp=0xc2ee9880, p=0xc2fef680) at ../../kern/vfs_subr.c:1830 #11 0xc01698b1 in getnewvnode (tag=VT_UFS, mp=0xc05c5e00, vops=0xc05aa000, vpp=0xc30a6d04) at ../../kern/vfs_subr.c:467 #12 0xc01d4e69 in ffs_vget (mp=0xc05c5e00, ino=20442, vpp=0xc30a6d84) at ../../ufs/ffs/ffs_vfsops.c:1082 #13 0xc01d8b1a in ufs_lookup (ap=0xc30a6ddc) at ../../ufs/ufs/ufs_lookup.c:538 #14 0xc01dd38d in ufs_vnoperate (ap=0xc30a6ddc) at ../../ufs/ufs/ufs_vnops.c:2309 #15 0xc0166978 in vfs_cache_lookup (ap=0xc30a6e38) at vnode_if.h:55 #16 0xc01dd38d in ufs_vnoperate (ap=0xc30a6e38) at ../../ufs/ufs/ufs_vnops.c:2309 #17 0xc0168dc1 in lookup (ndp=0xc30a6eb8) at vnode_if.h:31 #18 0xc0168894 in namei (ndp=0xc30a6eb8) at ../../kern/vfs_lookup.c:152 #19 0xc016e124 in stat (p=0xc2fef680, uap=0xc30a6f94) at ../../kern/vfs_syscalls.c:1651 #20 0xc0203c83 in syscall (frame={tf_es = 134873135, tf_ds = -1078001617, tf_edi = 0, tf_esi = 134890328, tf_ebp = -1077946844, tf_isp = -1022726172, tf_ebx = 134890372, tf_edx = -1077946944, tf_ecx = 134890388, tf_eax = 188, tf_trapno = 22, tf_err = 2, tf_eip = 134656096, tf_cs = 31, tf_eflags = 518, tf_esp = -1077946968, tf_ss = 47}) at ../../i386/i386/trap.c:1101 #21 0xc01fa53c in Xint0x80_syscall () #22 0x804b5f1 in ?? () #23 0x804a879 in ?? () #24 0x804a7f3 in ?? () #25 0x804a7f3 in ?? () #26 0x804a6fe in ?? () #27 0x804a7f3 in ?? () #28 0x804ab8a in ?? () #29 0x804a7ab in ?? () #30 0x804aa23 in ?? () #31 0x804a812 in ?? () #32 0x8051257 in ?? () #33 0x8051183 in ?? () #34 0x80480e9 in ?? () (kgdb) The End. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: EGCS breaks what(1)
On Monday, 5th April 1999, Matthew Dillon wrote: >:char sccs[] = { '@', '(', '#', ')' }; >:char version[] = blahhhfoo; >:Was contiguous. >'what' is broken. C does not impose any sort of address ordering >restriction on globals or autos that are declared next to each other. Well, it's really an abuse of 'what', and not anything wrong with 'what' ifself. It will continue to work fine doing the job it was designed to do. The NetBSD folks faced this problem some time ago, and I believe their solution was to duplicate the version information. So, version[] is the same as it used to be, and sccs[] is 4 bytes longer than version[] to hold a complete copy, and the @(#) prefix. This is then completely portable. Alternately, we could jimmy around with the current hack, and prefix it with 4 NULs, and see what happened. Sorry, I haven't tested this idea, as I've not yet made the EGCS jump. Stephen. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Slightly wonky auto memory probe + fix
[I posted this to -current because the technology is the same in -current even though this box will never run -current. Bear with me.] We've just got a new Dell PowerEdge (very nice) with 512MB of ram. By default, 3.1-stable sees only 64MB. Looking carefully, it sees 8KB less than 64MB, so it doesn't probe for the rest. I applied this patch, which fiddles the "Hmm got 64MB so probe for the rest" heuristic. With this patch, it found all 512MB, to the exact byte. Unfortunately, it kinda changes it from a "heuristic" to a "hack". :-( --- machdep.c Fri Feb 19 15:31:36 1999 +++ /tmp/sgm/machdep.c Tue Apr 6 23:40:36 1999 @@ -1428,7 +1428,7 @@ * the MAXMEM option or the npx0 "msize", then don't do the speculative * memory probe. */ - if (Maxmem >= 0x4000) + if (Maxmem >= 0x3f00) speculative_mprobe = TRUE; else speculative_mprobe = FALSE; @@ -1538,7 +1538,7 @@ if (phys_avail[pa_indx] == target_page) { phys_avail[pa_indx] += PAGE_SIZE; if (speculative_mprobe == TRUE && - phys_avail[pa_indx] >= (64*1024*1024)) + phys_avail[pa_indx] >= (63*1024*1024)) Maxmem++; } else { pa_indx++; Stephen. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: have live system with NFS client cache problems what do i do?
On Sunday, 11th April 1999, Alfred Perlstein wrote: >On Sun, 11 Apr 1999, Matthew Dillon wrote: > >> doing a 'file cd9660_bmap.o' on laptop (NFS client) gives me a >> cd9660_bmap.o: MS Windows COFF Unknown CPU >> >> An MS Windows binary? Do you have any msdos mounts on >> the client or server? How is /usr/obj mounted? >no i have no msdos mounted filesystems, i do however have an >unmounted win98 partition and a cdrom with joliet extentions mounted >however the cdrom only contains mp3s. This is a red herring: $ dd if=/dev/zero of=foo count=1 1+0 records in 1+0 records out 512 bytes transferred in 0.000114 secs (4487949 bytes/sec) $ file foo foo: MS Windows COFF Unknown CPU $ Look for the usual pack-of-nulls corruption instead. Stephen. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: have live system with NFS client cache problems what do i do?
On Sunday, 11th April 1999, Brian Feldman wrote: >This has nothing to do with DOS. In case you didn't get my other hint: >{"/home/green"}$ dd if=/dev/zero count=1 2>/dev/null | file - >standard input: MS Windows COFF Unknown CPU Don't ya just hate it when your mail is slow! Sigh... Stephen. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: ctm-mail cvs-cur.5292.gz 18/82
On Sunday, 2nd May 1999, Chuck Robey wrote: >On Mon, 3 May 1999, Jean-Marc Zucconi wrote: > >> This one did not arrive in my mailbox. Can someone send it to me? I >> would like to avoid downloading 6Mbytes again. > >I'm going to mail it to you separately, but it might not look like it >came from me. I also did not receive part 18. Are the individual parts kept anywhere for anonymous ftp access? Failures are rare, but they hit the big updates disproportionately and have a bigger effect on bigger updates, so it's a double lose. Stephen. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: Uncommitted dc0 fixes ...
On Wednesday, 4th September 2002, Martin Blapp wrote: >And this patch here together with patch III made the annoying messages (dc0: >failed to force tx and rx to idle mode) go away. And I can use now my card >without to replug the cable over again) I've been meaning to remove the annoying message for ages. Sorry about that. >+ if (DC_IS_INTEL(sc)) { >+ for (i = 0; i < DC_TIMEOUT; i++) { >+ isr = CSR_READ_4(sc, DC_ISR); >+ if (isr & DC_ISR_TX_IDLE && >+ (isr & DC_ISR_RX_STATE) >+ == DC_RXSTATE_STOPPED) >+ break; >+ DELAY(10); >+ } >+ } Conditionalising on DC_IS_INTEL() means most cards no longer wait until the TX and RX are idle. I don't have enough different if_dc cards to know if this is safe. On the other hand, every test I've done on my Intel and Macronix cards shows zero calls to DELAY() in this loop. The loop may as well not be there for those card types. Indeed, it isn't there at all in if_de and in a Linux driver I looked at. From this I'm guessing that no 21143 (real or clone) needs this check, though I've got no real proof. Out of all this fuzzy evidence, I guess the most sensible option is the patch you've proposed. If nobody else is interested, I'll commit this part of your patch cluster on the weekend. I suppose I could do the ADMtek auto tx underrun recover patch too, as it seems harmless to other cards. The other stuff I can't test at all. This driver represents a counterintuitive state of affairs. I was impressed when Bill Paul managed to support so many clone cards with one driver. But now nobody has enough hardware on hand to test any change properly. There's some sort of lesson to be learnt here. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: dc(4) patch
On Thursday, 19th September 2002, John Baldwin wrote: >--- if_dc.c 4 Sep 2002 18:14:17 - 1.77 >+++ if_dc.c 19 Sep 2002 20:57:03 - >@@ -1366,7 +1370,8 @@ >for (i = 0; i < DC_TIMEOUT; i++) { >isr = CSR_READ_4(sc, DC_ISR); >if (isr & DC_ISR_TX_IDLE && - (isr & DC_ISR_RX_STATE) == DC_RXSTATE_STOPPED) >+ ((isr & DC_ISR_RX_STATE) == DC_RXSTATE_STOPPED || >+(isr & DC_ISR_RX_STATE) == DC_RXSTATE_WAIT)) >break; >DELAY(10); >} Sadly this change is insufficient to satisfy all cards. The PNIC 82c169 does not idle the transmitter (stays in DC_TXSTATE_WAITEND), though the receiver goes idle OK. The Davicom DM9102 does not idle the receiver when asked (seems to get stuck in DC_RXSTATE_ENDCHECK) though it stops the transmitter OK. Your card does yet another thing. I know these things through 3rd party reports, not because I have any hardware to test. So at this point I think the best idea is to do the checks only on Intel hardware. At least I can verify that works on a real card I can see with my own eyes. Another valid option is to send me one of every dc(4) supported card, except genuine Intel and the Macronix 98715AEC. Stephen. PS The Intel manual says that one should check bit 8, not the receiver state bits, to see if the receiver is idle. That makes the test: (isr & DC_ISR_TX_IDLE && isr & DC_ISR_RX_READ) It doesn't help though since the uncooperative cards don't set that bit either. Also, I think DC_ISR_RX_READ should be spelled as DC_ISR_RX_IDLE. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: dc(4) patch
On Friday, 20th September 2002, John Baldwin wrote: >On 20-Sep-2002 Stephen McKay wrote: >> Sadly this change is insufficient to satisfy all cards. > >Well. I think we can keep the check for TX going idle and just not do >the check for RX going idle. The original code basically did this until >you submitted a patch to wpaul@ that fixed a logic bug (used || above >instead of &&) that effectively didn't do the RX idle check. Not quite. Davicom cards (and your card) fail to idle the receiver. PNIC cards fail to idle the transmitter. So it makes just as much sense as any other idea to check those bits only on cards that document that you have to check those bits. My documentation only covers Intel. :-) >Perhaps we should do the same here? This would be similar to what we do in >dc_tx_underrun() where we only make sure the TX is idle. Except that the documentation states you have to idle the TX and RX to change the full duplex bit, whereas you only have to idle the TX to change the transmit fifo threshold. And in dc_tx_underrun() only the genuine Intel chips are treated specially. Clones seem to work without idling the transmitter. Except the poor Davicom, which gets reset on every underrun (if anyone has one, and it gets underruns, you could try including it with the DC_IS_INTEL(sc) case and see what happens). Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: dc(4) patch
On Friday, 20th September 2002, John Baldwin wrote: >On 20-Sep-2002 Stephen McKay wrote: >> Not quite. Davicom cards (and your card) fail to idle the receiver. >> PNIC cards fail to idle the transmitter. So it makes just as much >> sense as any other idea to check those bits only on cards that document >> that you have to check those bits. My documentation only covers Intel. :-) > >Hmm, what if we went back then to waiting until at least one of either >TX or RX went idle? Did only waiting for one actually break any 21143 >cards? Well that's the funny thing. It's documented to be necessary on Intel 21143 chips, but I've never seen a non-zero delay between asking for the TX and RX to idle, and observing them to be idle. So we could probably delete the test-and-delay loop entirely. Waiting for just one of them to go idle, like we have in -stable, is just silly. Would you test for condition "A" and assume that means "B" is OK in any other part of the kernel? It's really hoping that idling the TX and RX take about the same time when there's no reason to believe that. I think the test in -stable is pretty much equivalent to having no test at all. The only solid documentation I've got demands *both* must be idle. But that's from Intel and describes the original chips. Hence, my view that we should test the bits on Intel chips and forget about it on the clones. Clones tend not to bother implementing all the limitations of the original anyway. If we find a clone that turns out to need the tests, we can enable them for that clone too. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: cvs commit: src/sys/pci if_dc.c
On Friday, 20th September 2002, Martin Blapp wrote: >I think we would have to test all cases with all cards. What cards >do you have Stephen, with which clone Chipsets ? Can you make a list >of them ? I've only got DE500 (genuine Intel 21143) and Macronix 98715AEC cards. Nothing PCMCIA or CardBus. Not a very big selection, I know. A lot of us will have to band together to test changes. >I've got somewhere another dc card which made problems. I guess >it was PNIC. PNIC is still a problem with the -current driver. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: cvs commit: src/sys/pci if_dc.c
On Friday, 20th September 2002, Martin Blapp wrote: >mbr 2002/09/20 08:18:13 PDT > > Modified files: >sys/pci if_dc.c > Log: > Fix the support for the AN985/983 chips, which do not set the > RXSTATE to STOPPED, but to WAIT. This should fix hangs which > could only be solved by replugging the cable. John's already mentioned we are still thinking about the right way to handle this but... > MFC after: 2 weeks ... I thought I should explicitly mention that merging this particular change as it stands is a bad idea because PNIC and Davicom cards (at least) are not yet correctly handled. The code in -stable is the old broken but apparently harmless code. This new code is attempting to be more correct but breaks support for some cards. Odd situation, no? Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
if_dc broken in -current
It's been quite a while since I updated my -current box, but when I did, I was surprised to find that my DE500 network card (21143 chip) had stopped working. The switch showed no link. Ifconfig showed "no carrier". After some fiddling, I reverted revision 1.56 (removal of mii_pollstat call) of sys/pci/if_dc.c and the DE500 went back to normal. It auto-negotiated 100Mbit full duplex, and now works fine. I expect the problem is actually in mii/dcphy.c but since I have very little understand of how this mii stuff is supposed to work, I have to leave that to others. If no one is available to give me a hand here, I'll have to go with plan B which is to simply back out rev 1.56 of if_dc.c. (That's not such a bad plan really, just slightly inefficient.) On a different dc driver note, I'm interested in knowing if anyone is using either a PNIC or Davicom with -current. There is a slight difference between -current and -stable, and the code in -current caused problems with PNIC and Davicom cards when it was briefly in -stable. I'm assuming that nobody is using such cards, and the little bit of code is going to annoy a few people when they try the 5.0 prerelease. I'd like to fix this before it causes too much trouble. For those who are curious, the troublesome piece of code is lines 1339 and 1340 (in rev 1.69): if (isr & DC_ISR_TX_IDLE && (isr & DC_ISR_RX_STATE) == DC_RXSTATE_STOPPED) which waits for confirmation that the transmitter and receiver are both idle before some configuration registers are fiddled with. With PNIC and Davicom cards, one or the other of these conditions never occurs. Or at least that was the trouble when this was in -stable, back in August. Could this problem have "magically" gone away? Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: if_dc broken in -current
On Friday, 22nd March 2002, "Ilmar S. Habibulin" wrote: >On Sat, 23 Mar 2002, Stephen McKay wrote: > >> It's been quite a while since I updated my -current box, but when I did, >> I was surprised to find that my DE500 network card (21143 chip) had stopped >> working. The switch showed no link. Ifconfig showed "no carrier". > >I've had the simular problem. Now i have media option set to needed value >in ifconfig_dc0 variable. This helped. What sort of card do you have? The output of dmesg would help. Have you tried 4.5 on this machine? Of course the dc driver should autonegotiate (and does so when I revert rev 1.56). Your info could help trace this problem. Stephen. PS I'm now assuming the number of -current users that use PNIC and Davicom cards with the dc driver is exactly zero. Oh well. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: if_dc broken in -current
On Monday, 25th March 2002, "Ilmar S. Habibulin" wrote: >On Mon, 25 Mar 2002, Stephen McKay wrote: > >> What sort of card do you have? The output of dmesg would help. Have you >> tried 4.5 on this machine? >I have some noname nic with Intel 21143 chip. dmesg attached. I'm using >only trustedbsd_mac branch on my ws. Yours seems to be the same as mine (from a chip and phy point of view) although mine has a DEC assigned ethernet address and yours is from Telebit. I don't think that difference matters. >> Of course the dc driver should autonegotiate (and does so when I revert >> rev 1.56). Your info could help trace this problem. >Well, i don't think this is the problem. Hardware became too much >inteligent now a days, so one have to use his own hands to make this >hardware work like user wants it to work. Maybe just put some FAQ about >dc(4) and autoconfigurable hubs/switches? Some things can be blamed on attempted intelligence gone wrong. But not this one. This is a simple bug. My card works perfectly under 4.5.0 on the same machine. It fails with -current. But with one change reverted, it works again. Now all I have to do is work out what is the real underlying cause, since the current code looks right at first glance. At least I have the old DEC datasheets, and some info on some of the clones. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: if_dc broken in -current
On Monday, 25th March 2002, Robert Watson wrote: >I think I have an identical problem involving a Linksys ethernet card >using if_dc. I have to force it to negotiate 10mbps, since it fails to >negotiate anything higher with my 10/100 switch. No idea why at all. > >dc0: port 0xe800-0xe8ff mem >0xfebfff00-0xfebf irq 10 at device 19.0 on pci0 >dc0: Ethernet address: 00:a0:cc:35:3e:56 >miibus0: on dc0 >dcphy0: on miibus0 >dcphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > >dc0: flags=8843 mtu 1500 >inet6 fe80::2a0:ccff:fe35:3e56%dc0 prefixlen 64 scopeid 0x1 >inet 192.168.11.150 netmask 0xff00 broadcast 192.168.11.255 >ether 00:a0:cc:35:3e:56 >media: Ethernet 10baseT/UTP >status: active > >If I set it to auto-negotiate or hard-set to 100mbps, no packets go back >or forth. I've had this problem for at least a year, if not longer. I >have the same problem with 4.4-STABLE using an identical card on different >hardware: if it tries to negotiate 100mbps, then it simply doesn't work. >If I force it to 10, it's fine. After careful consideration, I think this has to be a different problem. My problem is that auto-negotiation doesn't start at boot (when an address is assigned to dc0). If I explicitly set a speed, that speed works. Most bizarrely, if I misspell the media option, that causes a successful autonegotation! I mean, I type "ifconfig dc0 media 10baset" immediately after boot, and autonegotiation takes over. (If I spell it "10baset/utp" it goes into 10Mbit half-duplex mode, like you expect.) So it's just a hair's breadth away from working properly, and reverting rev 1.56 is enough for full operation to be restored. Since you explicitly set 100Mbit half-duplex and it doesn't work, then that must be something else. We could have a go at finding that bug too, but it will be harder, since I don't have a PNIC II here. I do have some info on the Macronix 98715A, which Bill Paul says is almost the same. Maybe we can get lucky. Stephen. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Enhancing the user experience with tcsh
On Friday, 10th February 2012, Eitan Adler wrote: >-alias la ls -a >+alias la ls -aF > alias lf ls -FA >-alias ll ls -lA >+alias ll ls -lAF >+alias ls ls -F > >Two people didn't like these changes but didn't explain why. This is >incredibly helpful, especially for a new user. If you dislike the >alias change please explain what bothers you about it? You should never, ever alias over a standard command in a default profile. It will only train new users incorrectly. Having to use \ls to get the real ls is not an answer. If you think -F should be the default behaviour of ls, commit it directly to the ls source. Then run away fast! :-) As for the other ls aliases, I don't see the point given "lf" already exists. My only advice for your overall .cshrc changes is to be minimal and aim low. You may have a chance at consensus then. Good luck! By the way, one of the nice things about FreeBSD vs Linux is that less shell configuration is set up by default, so less work is needed to undo it all before you can get your own settings done. Every "helpful" thing that is set in /.cshrc or any other global config file is something someone somewhere will have to discover and turn off. Try not to make it too hard for them. Stephen. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"