I made some progress.

[By the way, NetBean's console displays *everything* 100% fine.
 I decided to use one of the worst repl consoles: that of IntelliJ.
 I want to make sure I really understand what's the point behind all
this.]

  (import '(java.io PrintWriter PrintStream FileInputStream)
          '(java.nio CharBuffer ByteBuffer)
          '(java.nio.charset Charset CharsetDecoder CharsetEncoder)
          '(org.xml.sax InputSource))

  (def   utf8 "UTF-8")
  (def d-utf8 (.newDecoder (Charset/forName utf8)))
  (def e-utf8 (.newEncoder (Charset/forName utf8)))

  (def   latin1 "ISO-8859-1")
  (def d-latin1 (.newDecoder (Charset/forName latin1)))
  (def e-latin1 (.newEncoder (Charset/forName latin1)))

  (defmacro with-out-encod
    [encoding & body]
    `(binding [*out* (PrintWriter. (PrintStream. System/out true
~encoding) true)]
              ~...@body
              (flush)))

  (def s "québécois français")

  (print s)                         ;quÔøΩbÔøΩcois franÔøΩaisnil
  (with-out-encod latin1 (print s)) ;qu?b?cois fran?aisnil
  (with-out-encod utf8   (print s)) ;qu?b?cois fran?aisnil

  (def encoded (.encode e-utf8
                        (CharBuffer/wrap "québécois français")))
  (def s-d
    (.toString (.decode d-utf8 encoded)))

  (print s-d)                         ;quÔøΩbÔøΩcois franÔøΩaisnil
  (with-out-encod latin1 (print s-d)) ;qu?b?cois fran?aisnil
  (with-out-encod utf8   (print s-d)) ;qu?b?cois fran?aisnil

  (def f-d
    (:content (let [x (InputSource. (FileInputStream. "french.xml"))]
         (.setEncoding x latin1)
         (clojure.xml/parse x))))

  (print f-d)                         ;quÔøΩbÔøΩcois franÔøΩaisnil
  (with-out-encod latin1 (print f-d)) ;québécois français
  (with-out-encod utf8   (print f-d)) ;québécois français

So my theory, which is still almost certainly wrong, is:

1. When the input is a file whose encoding is, say, latin-1, it's easy
to decode it and then encode it however one wants.
2. When the input is a literal string in the source file, it looks
like it's impossible to encode it correctly, unless one first decodes
it from the source file's encoding. But then, I don't yet know how to
do this without actually reading the source file. :\


Daniel Jomphe wrote:
> I tried under eclipse.
>
> Default console encoding configuration (MacRoman):
>
>   #'user/s
>   quÔøΩbÔøΩcois franÔøΩaisnil
>   qu?b?cois fran?aisnil
>
>   #'user/snc
>   qu?b?cois fran?aisnil
>   qu?b?cois fran?aisnil
>
> Console configured to print using ISO-8859-1:
>
>   #'user/s
>   qu�b�cois fran�aisnil
>   qu?b?cois fran?aisnil
>
>   #'user/snc
>   qu?b?cois fran?aisnil
>   qu?b?cois fran?aisnil
>
> Console configured to print using UTF-8:
>
>   #'user/s
>   québécois françaisnil
>   québécois françaisnil
>
>   #'user/snc
>   québécois françaisnil
>   québécois françaisnil
>
> So as I come to understand it, it looks like UTF-8 should be the rolls-
> royce for my needs.
>
> May I correctly conclude the following?
>
>   Don't bother about encodings unless you're displaying something and
> it's unreadable; then, don't bother about it in the code; find a
> proper console or viewer.
>
> Doesn't that sound like offloading a problem to users? Isn't there
> something reliable that can be done in the code?
>
> Daniel Jomphe wrote:
> > Sorry for all these posts.
>
> > I pasted my last post's code into a fresh repl (not in my IDE), and
> > here's what I got (cleaned up):
>
> >   #'user/s
> >   québécois françaisnil
> >   qu?b?cois fran?aisnil
>
> >   #'user/snc
> >   québécois françaisnil
> >   qu?b?cois fran?aisnil
>
> > I'm not sure what to make out of it.
>
> > My terminal (Apple Terminal) supports the encoding, and prints
> > correctly s and snc out of the box.
> > When I use with-out-encoded, I actually screw up both s and snc's
> > printing.
>
> > Daniel Jomphe wrote:
> > > Now that I know for sure how to bind *out* to something else over
> > > System/out, it's time to bring back my encoding issues into scope:
>
> > >   (import '(java.io PrintWriter PrintStream))
>
> > >   (defmacro with-out-encoded
> > >     [encoding & body]
> > >     `(binding [*out* (java.io.PrintWriter. (java.io.PrintStream.
> > > System/out true ~encoding) true)]
> > >               ~...@body
> > >               (flush)))
>
> > >   (def nc "ISO-8859-1")
>
> > >   ;;; with a normal string
> > >   (def s "québécois français")
>
> > >   (print s)
> > >   ; quÔøΩbÔøΩcois franÔøΩaisnil
>
> > >   (with-out-encoded nc (print s))
> > >   ; qu?b?cois fran?aisnil
>
> > >   ;;; with a correctly-encoded string
> > >   (def snc (String. (.getBytes s nc) nc))
>
> > >   (print snc)
> > >   ; qu?b?cois fran?aisnil
>
> > >   (with-out-encoded nc (print snc))
> > >   ; qu?b?cois fran?aisnil
>
> > > I'm certainly missing something fundamental somewhere.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to