On 4/30/09 6:15 PM, "Reese" <howel...@inkworkswell.com> wrote:
> Tom Worster wrote: > >> why use SGML character entity references in a utf-8 file or stream? can't >> you just put the character in the file? > > Because, I thought, HTML files were basically just text files with > different file extensions, and that those other characters would not > store or display properly if saved in .txt format. Was I mistaken? yes. see http://www.w3.org/TR/html401/charset.html which says that html uses the UCS, a character-by-character equivalent to the Unicode character set. so if you use a unicode character encoding (such as utf-8) then you you have a direct encoding for every unicode character in html. so a utf-8 html file or stream should normally to have no entities other than <, >, $amp; and perhaps " as needed. texts file may be utf-8 encoded too. xml, json and csv files are more examples of text files that can be utf-8 encoded and use any unicode character simply and directly. windows and os-x have used utf-8 as their default text file encoding for many years now. > http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt > > That is supposed to be a UTF-8 encoded text file, between 1/3 and 1/2 > of the characters do not display correctly on my screen. this page looks great to me. i don't appear to have the runes or amharic fonts on my computer so those aren't showing but everything else works. why this doesn't work for you is not clear. it could be that your browser has a preference configured to override the charset specified in the http headers. or perhaps the browser does not observe the specified content type for txt files. > Either way, > this next link suggests that Turkish characters with no equivalent in > the English language should be encoded for Web display: > > http://webdesign.about.com/od/localization/l/blhtmlcodes-tr.htm don't believe everything you read on the web. while some browsers may tolerate it, i don't think pages encoded according to those suggestions would even be valid html. > And because that is off-topic, I'll throw this in: > > The consensus seems to be that the proposed "ifset()" and "ifempty()" > functions are more effort than they are worth. What I'd like to know > is, why "empty()" still exists when every time I turn around, the > mentors I turn to locally tell me not to use it, to use "isset()" > instead. Because empty() doesn't work with zero. Anyone care to take > a stab at that? perhaps because it's hard to get rid of language elements without breaking existing code? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php