samppi a écrit :
> I see. Does this mean that, if I expect to handle 32-bit characters,
> then I need to consider changing my character-handling functions to
> accept sequences of vectors instead?
>
> Also, how does (seq "\ud800\udc00") work? Does it split the character
> into two 16-bit characters? In the REPL, it seems to return (\? \?).
>   

seq on a String returns a sequence of Java characters (16 bits values).

(defn codepoints-seq [s] ; returns a seq of ints
  (let [s (str s)
        n (count s)
        f (fn this [i]
            (lazy-seq
              (when (< i n)
                (cons (.codePointAt s i)
                  (this (.offsetByCodePoints s i 1))))))]
    (f 0)))

;; => (codepoint-seq "\ud800\udc00a\ud800\udd00")
;; (65536 97 65792)

-- 
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.blogspot.com/ (en)



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to