I made some progress. [By the way, NetBean's console displays *everything* 100% fine. I decided to use one of the worst repl consoles: that of IntelliJ. I want to make sure I really understand what's the point behind all this.]
(import '(java.io PrintWriter PrintStream FileInputStream) '(java.nio CharBuffer ByteBuffer) '(java.nio.charset Charset CharsetDecoder CharsetEncoder) '(org.xml.sax InputSource)) (def utf8 "UTF-8") (def d-utf8 (.newDecoder (Charset/forName utf8))) (def e-utf8 (.newEncoder (Charset/forName utf8))) (def latin1 "ISO-8859-1") (def d-latin1 (.newDecoder (Charset/forName latin1))) (def e-latin1 (.newEncoder (Charset/forName latin1))) (defmacro with-out-encod [encoding & body] `(binding [*out* (PrintWriter. (PrintStream. System/out true ~encoding) true)] ~...@body (flush))) (def s "québécois français") (print s) ;quÔøΩbÔøΩcois franÔøΩaisnil (with-out-encod latin1 (print s)) ;qu?b?cois fran?aisnil (with-out-encod utf8 (print s)) ;qu?b?cois fran?aisnil (def encoded (.encode e-utf8 (CharBuffer/wrap "québécois français"))) (def s-d (.toString (.decode d-utf8 encoded))) (print s-d) ;quÔøΩbÔøΩcois franÔøΩaisnil (with-out-encod latin1 (print s-d)) ;qu?b?cois fran?aisnil (with-out-encod utf8 (print s-d)) ;qu?b?cois fran?aisnil (def f-d (:content (let [x (InputSource. (FileInputStream. "french.xml"))] (.setEncoding x latin1) (clojure.xml/parse x)))) (print f-d) ;quÔøΩbÔøΩcois franÔøΩaisnil (with-out-encod latin1 (print f-d)) ;québécois français (with-out-encod utf8 (print f-d)) ;québécois français So my theory, which is still almost certainly wrong, is: 1. When the input is a file whose encoding is, say, latin-1, it's easy to decode it and then encode it however one wants. 2. When the input is a literal string in the source file, it looks like it's impossible to encode it correctly, unless one first decodes it from the source file's encoding. But then, I don't yet know how to do this without actually reading the source file. :\ Daniel Jomphe wrote: > I tried under eclipse. > > Default console encoding configuration (MacRoman): > > #'user/s > quÔøΩbÔøΩcois franÔøΩaisnil > qu?b?cois fran?aisnil > > #'user/snc > qu?b?cois fran?aisnil > qu?b?cois fran?aisnil > > Console configured to print using ISO-8859-1: > > #'user/s > qu�b�cois fran�aisnil > qu?b?cois fran?aisnil > > #'user/snc > qu?b?cois fran?aisnil > qu?b?cois fran?aisnil > > Console configured to print using UTF-8: > > #'user/s > québécois françaisnil > québécois françaisnil > > #'user/snc > québécois françaisnil > québécois françaisnil > > So as I come to understand it, it looks like UTF-8 should be the rolls- > royce for my needs. > > May I correctly conclude the following? > > Don't bother about encodings unless you're displaying something and > it's unreadable; then, don't bother about it in the code; find a > proper console or viewer. > > Doesn't that sound like offloading a problem to users? Isn't there > something reliable that can be done in the code? > > Daniel Jomphe wrote: > > Sorry for all these posts. > > > I pasted my last post's code into a fresh repl (not in my IDE), and > > here's what I got (cleaned up): > > > #'user/s > > québécois françaisnil > > qu?b?cois fran?aisnil > > > #'user/snc > > québécois françaisnil > > qu?b?cois fran?aisnil > > > I'm not sure what to make out of it. > > > My terminal (Apple Terminal) supports the encoding, and prints > > correctly s and snc out of the box. > > When I use with-out-encoded, I actually screw up both s and snc's > > printing. > > > Daniel Jomphe wrote: > > > Now that I know for sure how to bind *out* to something else over > > > System/out, it's time to bring back my encoding issues into scope: > > > > (import '(java.io PrintWriter PrintStream)) > > > > (defmacro with-out-encoded > > > [encoding & body] > > > `(binding [*out* (java.io.PrintWriter. (java.io.PrintStream. > > > System/out true ~encoding) true)] > > > ~...@body > > > (flush))) > > > > (def nc "ISO-8859-1") > > > > ;;; with a normal string > > > (def s "québécois français") > > > > (print s) > > > ; quÔøΩbÔøΩcois franÔøΩaisnil > > > > (with-out-encoded nc (print s)) > > > ; qu?b?cois fran?aisnil > > > > ;;; with a correctly-encoded string > > > (def snc (String. (.getBytes s nc) nc)) > > > > (print snc) > > > ; qu?b?cois fran?aisnil > > > > (with-out-encoded nc (print snc)) > > > ; qu?b?cois fran?aisnil > > > > I'm certainly missing something fundamental somewhere. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---