Re: Clean-room Engine.IO implementation committed to git 8.0/8.1

Chris Angelico Wed, 23 Nov 2016 04:50:12 -0800

On Wed, Nov 23, 2016 at 11:10 PM, Marcus Comstedt (ACROSS) (Hail
Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se>
wrote:
>>I agree, but using string(8bit) to mean "binary data" is something
>>that's 100% backward compatible.
>
> It would not be backwards compatible, since that is not what
> string(8bit) means today.


By "binary data", I mean eight-bit strings of arbitrary bytes - like
you'd read from a file or something. Currently, functions like
Stdio.read_file simply return "string", but they'll effectively be
returning string(8bit).

>>Unicode text would always be referred
>>to as string(21bit), even if it happens to contain nothing but Latin-1
>>characters.
>
> That doesn't really make sense.  So you say that "R\xe4ksm\xf6rg\xe5s"
> would have type string(21bit)?  What type would "\U12345678" have?

\U12345678 possibly should be an error, as it's not valid Unicode.
Maybe the Pike string type can be used for other things, but they're
not Unicode text - so you could use string(32bit) for those sorts of
non-textual strings. (I don't know of any use cases, so I can't say
beyond that.) My statement about Unicode text specifically excludes
anything that isn't valid according to the Unicode standard.

> What type would "Foo" have?  How would you specify a UTF-8 encoded
> literal?

Now, these are questions that can't truly be answered with the current
system. I would like the former to be string(7bit), and the latter
would be either string(7bit) or string(8bit) depending on whether
there are non-ASCII characters in it. But they're probably both just
type 'string' at the moment.

ChrisA

Re: Clean-room Engine.IO implementation committed to git 8.0/8.1

Reply via email to