I will hazard a guess, that when you open the file for reading, you can open binary first and see if the first two characters amount to FE FF, yes? If so, treat as UTF-16. If not, treat as UTF-8. I have not tested this strategy myself, but your second point seems to give the clue to solve this mystery.
Bob On Jan 16, 2013, at 9:15 AM, Nishok Love wrote: > Thanks, Bob. Your command works but the same results occur. Further > investigations here found this > > When Pages is used to export as "Text", the resulting file may be of two > kinds: > > (1) if the document contained only characters included in Apple MacRoman > charset, the file is a pure text file based on Apple MacRoman encoding. > > (2) if the document contained extraneous characters the created text file > take care of this feature and uses the UTF encoding (two bytes per character) > and starts with the logical BOM: "FE FF". > > which I've copied from the discussion on > https://discussions.apple.com/message/9518841?messageID=9518841#9518841?messageID=9518841 > > Opening both files with TextEdit (which displays both of them correctly, ie > without all those extra spaces), duplicating them and then watching the save > options shows that one file (the one from Pages) is using UTF-16 whilst > Word's Western (Mac OS Roman) export is in UTF-8. Using GetInfo I can now see > that the UTF-16 file is twice the size of the other. > > In short, text files are not as simple as they used to be! > > So I'm still looking for a way for LiveCode to spot whether it's opening a > file in UTF-8 or UTF-16 (or something else - aaarrgh!). Can I access the file > header? read from file just gives me the data... > > I could read the file, count the number of characters and how many of them > are spaces and from that I could infer which format is being used. Probably > this would be reliable for my purposes - just not very elegant! > > Nishok > > >> I am not sure why you are seeing this. I exported a pages newsletter file as >> text, then ran this command on it: >> >> on mouseUp pMouseBtnNo >> answer file "Pick a text file" with >> "/Users/bobsneidar/Desktop/SneidarNewsletter.txt" >> put it into theFile >> open file theFile for read >> read from file theFile until cr >> put it >> close file theFile >> end mouseUp >> >> I got this in the message box: >> >> 2005 Summer Edition >> >> Seems to work. >> >> Bob >> >> >> >> On Jan 15, 2013, at 10:34 AM, NISHOK LOVE wrote: >> >>> Hi All >>> >>> I have a problem when I open .txt files in OSX, and I don't have much >>> (any!) experience of reading files in LiveCode. >>> >>> I have a file originally written in Word on Windows. When I export it as a >>> .txt from Word for Mac I just accept the default Mac OS encoding option >>> (Western (Mac OS Roman) and it all works fine when I open the file in my >>> LiveCode. >>> >>> But when I open the original file in Pages and export it as Plain Text, I >>> get a different result. When I open that file in LiveCode I find a space >>> has been inserted after every character. So Hello world becomes H e l l o >>> w o r l d. >>> >>> I guess this is a problem with the encoding, but how can my LiveCode >>> understand what the incoming file's encoding is and respond accordingly? My >>> LiveCode needs to be able to deal with any kind of text file... >>> >>> Thanks, >>> Nishok Love >>> _______________________________________________ >>> use-livecode mailing list >>> [email protected] >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >> > > _______________________________________________ > use-livecode mailing list > [email protected] > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list [email protected] Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
