I'm currently working an implementation for the CLR which is why I'm 
looking at it. The CLR uses the same approach as JS and Java for UTF16 
surrogate pairs more or less. I'd be surprised if this was intentional but 
since the code has no comments I can only speculate :-)

On Thursday, November 7, 2019 at 2:20:44 PM UTC-5, Francis Avila wrote:
>
> Perhaps this is so invalid character streams (e.g. mismatched or orphaned 
> surrogate pairs) can survive encoding and decoding (I haven't tested)? 
> Strictly speaking not every CharacterSequence is validly encode-able to 
> utf-8. Java just kind of hides this. For example, this is a reversed 
> surrogate pair (or two orphaned surrogates, take your pick):
>
> (mapv #(Integer/toHexString (int %)) (String. (.getBytes "\uDC00\uD800" 
> "UTF-8") "UTF-8"))
> => ["3f" "3f"]
>
> Note that Java's utf-8 encoder will translate these to "?", losing 
> information about the original char value.
>
> That said, if this is the case, it makes more sense for fressian to say 
> "we have a custom encoding that is mostly utf-8 except it preserves invalid 
> utf-16" than "this is utf-8". I wonder if other fressian implementations 
> handle this the same way? Javascript also shares java's utf-16 string type 
> but not every platform does.
>
>
> On Thursday, November 7, 2019 at 6:51:40 AM UTC-6, Kyle Wilt wrote:
>>
>> I posted an issue about this to the datomic/fressian github page but I 
>> don't know if anyone is monitoring it anymore.
>>
>> https://github.com/Datomic/fressian/issues/7
>>
>> I'm trying to find out if this is intentional for some reason or a bug. 
>> Right now it encodes UTF16 surrogate pairs as two 3 byte values for 
>> 10FFFF rather than one 4 byte value as expected.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/c7a1f6c2-56b7-4a9e-af5a-9bbf1758783d%40googlegroups.com.

Reply via email to