On Tue, Apr 17, 2018 at 1:49 PM, Fred Cisin via cctalk < cctalk@classiccmp.org> wrote:
> >>> > I always found it amusing that many programs (even FORMAT!) would fail > with the wrong error message if their internal DMA buffers happened to > straddle a 64K block boundary. THAT was a direct result of failure to > adequately integrate, or at least ERROR-CHECK!, the segment-offset kludge > bag. Different device drivers and TSRs could affect at 16 byte intervals > where the segment of a program ended up loading. > It was NOT hard to normalize the Segment:Offset address and MOVE the > buffer to another location if it happened to be straddling. > > Huh. I would guess that this is the source of a DOS bug that I found back in the day, reported to MS, and never heard back. Working on a large application, ground out a new release, got a call from the production (the guy that ran the floppy duplicator) that his quality control tests were failing -- the application on the floppies wouldn't start. I grabbed one, it ran on my machine ok, wouldn't run on production's test machine. Confiscated that machine and started swapping out hardware, nothing helped. Tried adding tracing code to the application to see if I could narrow down the failure point, but discovered that changing the executable would change the behavior -- a heisenbug. Eventually worked that the crash was related to the address that the executable was loaded at, which was dependent on the various TSRs that were loaded -- with the production test machine driver configuration, the load address would reliably crash the application. If I adjusted the load address to match on my machine, I would get the same crash. To figure out what the crash was about, I ended up writing a small TSR that set the "break on every instruction bit", and would push the PC and opcode out the serial port, and collected the data streams for the crashing and non-crashing configurations. Diffed the data streams to find where they were diverging. The application was large enough to have overlays -- as the program was starting up, an overlay with run-once initialization code would be read from disk and jumped into; in the crash configuration, the overlay code seemed not to be being read, or read incorrectly -- the first opcode in the overlay was wrong. Wrote a simple program that read a data file of the same size as the overlay into different locations in memory and verified that the data was read, demonstrating that DOS was failing for one buffer address but not another, documented it, and send it off to MS and told management that the bug was MS and there really wasn't anything we could do. Never heard back from them, and have actively avoided MS software ever since. A buffer boundary straddling error certainly sounds like the issue I was seeing; it feels very odd to see a plausible explanation 35 years later. -- Charles