True, I typed in haste, although my thoughts still seem valid, that most string data was smaller back then, and we didn't have thousands of buggy programs blindly copying data from a socket to a static buffer until one of them contained a magic 0x00.
I was assuming that PDP 7 would have a handy equivalent to REPNZ SCASB, otherwise to my mind they wouldn't have used null-terminated strings so readily. I still want to know why Intel didn't implement REPNZ MOVSB, which would have worked better with C strings :-) But explicit length is definitely the way in my book, along with the ability to to record I/O as well as stream. Roops On Tue, 31 Dec 2024, 11:46 Jay Maynard, < 000005997213d6c2-dmarc-requ...@listserv.ua.edu> wrote: > Thompson and Ritchie were working on the PDP-7 and PDP-11 during the > formative years of C and its predecessors B and BCPL, not the x86, so they > didn't have REP MOVSB and friends available. From a Bell Labs article on > the evolution of C: > > "None of BCPL, B, or C supports character data strongly in the language; > each treats strings much like vectors of integers and supplements general > rules by a few conventions. In both BCPL and B a string literal denotes the > address of a static area initialized with the characters of the string, > packed into cells. In BCPL, the first packed byte contains the number of > characters in the string; in B, there is no count and strings are > terminated by a special character, which B spelled `*e'. This change was > made partially to avoid the limitation on the length of a string caused by > holding the count in an 8- or 9-bit slot, and partly because maintaining > the count seemed, in our experience, less convenient than using a > terminator." > > https://www.bell-labs.com/usr/dmr/www/chist.html > > On Tue, Dec 31, 2024 at 5:31 AM Rupert Reynolds <rreyno...@cix.co.uk> > wrote: > > > I've long been searching for a solution that might become a defacto > > standard. So far we have a collection of proprietary file formats. > > > > But we should remember that C was originally not a general purpose > > language, but a tool for a specific need. > > > > Few would have predicted that C and its silly strings would power a new > > wave of computing (which ignored many of the lessons learned even before > > S/360 was released--as Alan Kay said, computing 'is a pop culture') > > > > Thompson and Ritchie needed something to build an OS for lower cost > > hardware and the assembler wasn't sophisticated enough to make that easy, > > or even doable, but C worked for them. > > > > They only needed to move short strings, so having the CPU run REPNZ SCASB > > followed by REP MOVSB was an acceptable overhead. Oh yes, why was there > no > > REPNZ MOVSB? > > > > One of the reasons I use Rexx for testing everything is that it doesn't > > barf on magic numbers hidden in the data. > > > > Roops > > > > On Tue, 31 Dec 2024, 08:01 Clement Clarke, <ad...@oscar-jol.com> wrote: > > > > > Over many years, I have brought up the C string problem. Both it's > speed > > > and safety. I developed some C macros which improve the situation, > > however > > > I think that by simply introducing a new file type, say .VB like .EXE, > > many > > > speed and safety issues can be corrected relatively simply, as well > > > as reducing the electricity and cooling water needed to shunt 5 billion > > > emails around the world each day. > > > > > > The VB format would follow IBM's VB format. Thus the length of the > > record > > > would precede the data, and allow records to be moved and copied > without > > > constantly searching for binary zeros, or carriage returns and line > > feeds. > > > > > > My tests have consistently shown that copying strings with C takes a > > > minimum of about 2.5 times longer to copy C terminated strings. Often, > > the > > > time is greater depending on the compiler and associated routines. > > > > > > I believe implementing such code would be relatively trivial. And I > > > would also suggest the record length and blocksize be 4 bytes rather > > than 2 > > > bytes. > > > > > > The open routines could look to see if the file or data set is .VB, and > > if > > > so set a flag. When a GET or equivalent is issued, then it is a simple > > > matter to read the block if necessary, and load the length of the > record > > > and issue an MVCL or equivalent on other machines. > > > > > > Gradually, C and C++ could be changed internally to follow the > excellent > > > PL/I practice of having the length of variable length strings at the > > front > > > of the strings which means searching for string terminators is not > > > necessary. > > > > > > Kind regards, and a Happy New Year to all, > > > > > > Clem Clarke > > > PS: Some further information is at > > > https://start.oscar-jol.com/fast-safe-c-strings > > > > > > > > > > > > > > > > > > Charles Mills kindly suggested that C++ was the answer. However, I > think > > > > > > ---------------------------------------------------------------------- > > > For IBM-MAIN subscribe / signoff / archive access instructions, > > > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > > > > > > > ---------------------------------------------------------------------- > > For IBM-MAIN subscribe / signoff / archive access instructions, > > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > > > > > -- > Jay Maynard > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN