I am trying to parse a stream from a tcp connection.

I think the data is utf8, here is a sample

         20 2d 20 c8 65 73 6b fd 20 72 6f 7a 68 6c 61 73

which when I print it I get:

     -       e  s  k       r  o  z  h  l  a  s           
           ^          ^
        missing    missing

there are two missing characters. Ok, bad UTF8 perhaps?
but when I try unicode(1) I see:

        unicode c8 fd
        È
        ý

Is this 8 bit runes? (!)
Is there a name for such a thing?
Is this common?
Is it just MS code pages but the >0x7f values happen (designed to) to map onto 
the same letters as utf8?

thanks in advance of useful suggestions ☺

-Steve


Reply via email to