--- On Wed, 3/30/11, Kee Nethery <k...@kagi.com> wrote: > > > > Ideally, all the conversion would take place at the > end-points: > > open file <theFilePath> for text read with > encoding <theEncoding> > > open file <theFilePath> for text write with > encoding <theEncoding> > > put <theVariable> into URL <theUrl> with > encoding <theEncoding> > > put URL <theUrl> into <theVariable> with > encoding <theEncoding> > > > > Internally, the engine would handle everything in > UTF-16, > > :-) UTF-16 BE or UTF-16 LE? >
I don't care, as long as my app can read and write both, regardless of the platform it is running on ;-) > and the with encoding, that would be what you want or what > you have? > The encoding would be whatever the incoming stuff is for reading, and whatever the outgoing stuff should be for writing. In Java, every string uses UTF-16 internally, and conversion is handled via InputStreamReaders and OutputStreamWriters. Here's an example of using an InputStreamReader: ## FileInputStream fis = new FileInputStream("input.txt"); InputStreamReader isr = new InputStreamReader(fis, "UTF8"); ## which means that whatever you're reading from the file "input.txt" should be interpreted as UTF-8. Likewise, the following example for an OutputStreamWriter: ## FileOutputStream fos = new FileOutputStream("output.txt"); Writer out = new OutputStreamWriter(fos, "ISO-8859-1"); ## which means that whatever you're writing to the file "output.txt" should end up on the hard drive as ISO-8859-1. And if you don't specify the charset name, it will apply a platform-specific default - which you can override with a startup parameter. LiveCode does something similar (think of how it automatically interprets CR/LF depending on the platform) unless you read the file as binary and then do the encoding. Anyway, in my earlier examples of how LiveCode might do it: ## open file "input.txt" for text read with encoding "UTF-8" ## means that the engine should interpret whatever is in the file as UTF-8 encoded. > If I was king of LiveCode Unicode, I'd make all text UTF8. > I'd have char refer to the characters regardless how many > bytes are required to encode it. Bytes would give you the > actual encoding values. I'd have a convert function to > change text (or binary if it came in as something other than > utf8) into something else and I'd require the from encoding > and the to encoding to be specified. The "with encoding" > assumes I know which side of the encoding is assumed to be > something (the from or the to) and I don't. I'm a big fan of > explicit rather than assumed in the code I write. > That's why they introduced the 'byte' chunk type in version 3.0 - in preparation of a time where a char can be more than one byte and we wouldn't have to know or care as the engine does the right thing. Making everything UTF-8 means you'll statistically have a harder time to figure out chunk byte ranges, as you have to check each and every byte to know whether the char is actually 1, 2, 3 or 4 bytes. If you use UTF-16 instead, you'll eat more memory if your data stays in the ASCII range, but most character sets will fit happily into two bytes - and for the ones that do require 4 bytes instead of 2, you only need to check every other byte. Jan Schenkel. ===== Quartam Reports & PDF Library for LiveCode www.quartam.com ===== "As we grow older, we grow both wiser and more foolish at the same time." (La Rochefoucauld) _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode