On Wed, Nov 23, 2016 at 11:10 PM, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se> wrote: >>I agree, but using string(8bit) to mean "binary data" is something >>that's 100% backward compatible. > > It would not be backwards compatible, since that is not what > string(8bit) means today.
By "binary data", I mean eight-bit strings of arbitrary bytes - like you'd read from a file or something. Currently, functions like Stdio.read_file simply return "string", but they'll effectively be returning string(8bit). >>Unicode text would always be referred >>to as string(21bit), even if it happens to contain nothing but Latin-1 >>characters. > > That doesn't really make sense. So you say that "R\xe4ksm\xf6rg\xe5s" > would have type string(21bit)? What type would "\U12345678" have? \U12345678 possibly should be an error, as it's not valid Unicode. Maybe the Pike string type can be used for other things, but they're not Unicode text - so you could use string(32bit) for those sorts of non-textual strings. (I don't know of any use cases, so I can't say beyond that.) My statement about Unicode text specifically excludes anything that isn't valid according to the Unicode standard. > What type would "Foo" have? How would you specify a UTF-8 encoded > literal? Now, these are questions that can't truly be answered with the current system. I would like the former to be string(7bit), and the latter would be either string(7bit) or string(8bit) depending on whether there are non-ASCII characters in it. But they're probably both just type 'string' at the moment. ChrisA