"Kaz Kylheku" <k...@kylheku.com> wrote in message news:20120408114313...@kylheku.com...
Worse, the one byte Unix mistake being covered is, disappointingly, just a clueless rant against null-terminated strings. Null-terminated strings are infinitely better than the ridiculous encapsulation of length + data. For one thing, if s is a non-empty null terminated string then, cdr(s) is also a string representing the rest of that string without the first character, where cdr(s) is conveniently defined as s + 1.
If strings are represented as (ptr,length), then a cdr(s) would have to return (ptr+1,length-1), or (nil,0) if s was one character. No big deal. (Note I saw your post in comp.lang.python; I don't about any implications of that for Lisp.) And if, instead, you want to represent all but the last character of the string, then it's just (ptr,length-1). (Some checking is needed around empty strings, but similar checks are needed around s+1.) In addition, if you want to represent the middle of a string, then it's also very easy: (ptr+a,b).
Not only can compilers compress storage by recognizing that string literals are the suffixes of other string literals, but a lot of string manipulation code is simplified, because you can treat a pointer to interior of any string as a string.
Yes, the string "bart" also contains "art", "rt" and "t". But with counted strintgs, it can also contain "bar", "ba", "b", etc.... There are a few advantages to counted strings too...
length + data also raises the question: what type is the length field? One byte? Two bytes? Four?
Depends on the architecture. But 4+4 for 32-bits, and 8+8 bytes for 64-bits, I would guess, for general flex strings of any length. There are other ways of encoding a length. (For example I use one short string type of maximum M characters, but the current length N is encoded into the string, without needing any extra count byte (by fiddling about with the last couple of bytes). If you're trying to store a short string in an 8-byte field in a struct, then this will let you use all 8 bytes; a zero-terminated one, only 7.)
And then you have issues of byte order.
Which also affects every single value of more than one byte.
Null terminated C strings can be written straight to a binary file or network socket and be instantly understood on the other end.
But they can't contains nulls!
Null terminated strings have simplified all kids of text manipulation, lexical scanning, and data storage/communication code resulting in immeasurable savings over the years.
They both have their uses. -- Bartc -- http://mail.python.org/mailman/listinfo/python-list