Perhaps this is so invalid character streams (e.g. mismatched or orphaned surrogate pairs) can survive encoding and decoding (I haven't tested)? Strictly speaking not every CharacterSequence is validly encode-able to utf-8. Java just kind of hides this. For example, this is a reversed surrogate pair (or two orphaned surrogates, take your pick):
(mapv #(Integer/toHexString (int %)) (String. (.getBytes "\uDC00\uD800" "UTF-8") "UTF-8")) => ["3f" "3f"] Note that Java's utf-8 encoder will translate these to "?", losing information about the original char value. That said, if this is the case, it makes more sense for fressian to say "we have a custom encoding that is mostly utf-8 except it preserves invalid utf-16" than "this is utf-8". I wonder if other fressian implementations handle this the same way? Javascript also shares java's utf-16 string type but not every platform does. On Thursday, November 7, 2019 at 6:51:40 AM UTC-6, Kyle Wilt wrote: > > I posted an issue about this to the datomic/fressian github page but I > don't know if anyone is monitoring it anymore. > > https://github.com/Datomic/fressian/issues/7 > > I'm trying to find out if this is intentional for some reason or a bug. > Right now it encodes UTF16 surrogate pairs as two 3 byte values for 10FFFF > rather > than one 4 byte value as expected. > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/7c07a8c4-6674-495a-b96c-a95505875b53%40googlegroups.com.