-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Johnny,
Johnny Kewl wrote: > If this locale stuff is in fact defaulting to an ISO char set that can > do these symbols... and say you where making a non english page, say > Japanese... do you think that its possible to use it? It is up to your browser to choose a font that is appropriate for all glyphs (that is, a graphical representation of a code point) that need to be displayed. Some fonts do not support all codepoints because they don't have all the glyphs. For instance, if you have a string in English and also Sanskrit, your browser is likely to display one string in one font (maybe Arial) and the other in another font (say, Sanskrit). Let's say that the browser comes across the £ entity. £ maps directly to 8-bit hex character code 0xa3 (http://htmlhelp.com/reference/html40/entities/latin1.html). Whether you put £ or £ in your HTML, the browser should render it properly -- possibly switching fonts to one that supports that code point for that character only. The problem with your page is not that the £ symbol is not available in the font the browser chose. Your problem is that you illegally encoded it into the page in the first place (or, equivalently, you advertise the wrong encoding for the page, which is really the same thing). If you re-write your page to declare some <font> around that symbol, you will never be able to get it to work, unless you use the browser to override the server-declared encoding (as Chuck did, when things render properly when using ISO-8859-1). > I've actually now seen examples on the web that are doing it Wil's way, > they using the getCurrencyInstance to make the currency symbols. Use of Java's built-in currency-symbol-generating methods are likely to produce a proper £ symbol. If you have your encoding chain set up properly, it should go from NumberFormat.format() straight to your web page without a hint of difficulty. > But I'm thinking its a US/Eng only methodology... when applied to a web > page. > Do you think using getCurrencyInstance is generalizable in other languages? Absolutely. The only reason $ is a magic symbol is because it's part of US-ASCII and low enough in the symbol table so that it never gets screwed up by incorrect encodings. Symbols like £ or € do not share that luxury and are therefore error-prone when administrators poorly configure their servers. It's further compounded by the fact that many English-specking coders forget that there are other people in the world. :( > When you say.... "If I override that with say ISO-8859-15", is that the > whole page you talking about, or it possible to have different character > encoding sections in a web page.... thats another area thats confusing > me now, because if I do look at that test page in a MS tool... it > displays correctly with mixed encodings? The encoding is for the entire document, not just a single character. basically, you sent an illegal character code. It would be like sending 6 bits of an 8-bit byte. In fact, that's /exactly/ what you did because, to a UTF-8 renderer, your set of 8 bits looks like there should be something else /before/ it in order to make it legal. Your server said "hey, client... I'm gonna send you a bunch of oranges" and then went right ahead and sent apples mixed-in with those oranges. > But when you choose a font in a text editor like Swing or Word, you are > also picking some character set... and thats whats been injected into > the page as its been formed... Yes and no. Many encodings are limited by a particular character set (for instance, US-ASCII is never going to have Sanskrit letters in it). But that'd why Unicode was invented: to make sure that anything we'd ever possibly want to show on the screen is possible because we have enough bits to display it. (My understanding is that Unicode (16-bit) is actually not big enough for everything, but hey, they tried). The beauty of UTF-8 is that every character you'd want to display has its own code that nobody can steal -- regardless of the font being used. The lesson is to always use UTF-8 and make sure you actually have everything working properly. If your server is saying "utf-8" but the character encoding on your servlet Writer is actually "ISO-8859-1" then you haven't done your job and your web pages are going to look broken when non-latin characters are thrown in there. The same is true if you are serving static content (as I suspect you are in your example) and advertising that it is "utf-8" but the file was written with ISO-8859-1 (or something else). (In your case, the problem is that text files contain no explicit encoding information in them, so the server has to guess -- or, more likely, there's no guessing going on, and the server just blindly uses whatever its default has been configured to be.) > I screw up terminology... ok we all know that.... but > Does Wil need to worry about the way he is doing it?... thats all I'm > asking... I think so... The short answer is no: Wil does not need to worry. If his code is generating a proper € or £ then, as long as the server isn't lying abound the encoding, everything will be fine. Unless the browser sucks. ;) - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjKmcgACgkQ9CaO5/Lv0PAIVACfT+P6XVbLFDngXT6+C5jEzAQ8 TXUAoKVtwsaijbpdfTY9mEISD7G4Ho+t =35Pr -----END PGP SIGNATURE----- --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]