"Kaz Kylheku" <k...@kylheku.com> wrote in message
news:20120408114313...@kylheku.com...

Worse, the one byte Unix mistake being covered is, disappointingly, just a
clueless rant against null-terminated strings.

Null-terminated strings are infinitely better than the ridiculous
encapsulation of length + data.

For one thing, if s is a non-empty null terminated string then, cdr(s) is
also
a string representing the rest of that string without the first character,
where cdr(s) is conveniently defined as s + 1.

If strings are represented as (ptr,length), then a cdr(s) would have to
return (ptr+1,length-1), or (nil,0) if s was one character. No big deal.

(Note I saw your post in comp.lang.python; I don't about any implications of
that for Lisp.)

And if, instead, you want to represent all but the last character of the
string, then it's just (ptr,length-1). (Some checking is needed around empty
strings, but similar checks are needed around s+1.)

In addition, if you want to represent the middle of a string, then it's also
very easy: (ptr+a,b).

Not only can compilers compress storage by recognizing that string
literals are
the suffixes of other string literals, but a lot of string manipulation
code is
simplified, because you can treat a pointer to interior of any string as a
string.

Yes, the string "bart" also contains "art", "rt" and "t". But with counted
strintgs, it can also contain "bar", "ba", "b", etc....

There are a few advantages to counted strings too...

length + data also raises the question: what type is the length field? One
byte? Two bytes? Four?

Depends on the architecture. But 4+4 for 32-bits, and 8+8 bytes for 64-bits,
I would guess, for general flex strings of any length.

There are other ways of encoding a length.

(For example I use one short string type of maximum M characters, but the
current length N is encoded into the string, without needing any extra count
byte (by fiddling about with the last couple of bytes). If you're trying to
store a short string in an 8-byte field in a struct, then this will let you
use all 8 bytes; a zero-terminated one, only 7.)

And then you have issues of byte order.

Which also affects every single value of more than one byte.

Null terminated
C strings can be written straight to a binary file or network socket and
be
instantly understood on the other end.

But they can't contains nulls!

Null terminated strings have simplified all kids of text manipulation,
lexical
scanning, and data storage/communication code resulting in immeasurable
savings over the years.

They both have their uses.

--
Bartc


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to