On 08.02.2016 23:31, Christopher Schultz wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

All,

On 2/8/16 3:43 PM, Mark Thomas wrote:
On 08/02/2016 18:41, Jason Ricles wrote:
I have an application that sends binary websocket messages
between a class and the web application using a websocket server
written in java.

The data being sent from the java class is encoded in a binary
buffer with the bytes in ISO8859_1. However, when I receive the
bytes on the websocket server and the web application end they
are junk (such as -121, -116, etc.) and not encoded the correct
way that they need to be.

The bytes are transmitted as unsigned on the wire (as required by
the WebSocket spec). Java handles them as signed. You need to
convert them. Something like (untested):

char c = b & 0xFF;

I had to read this something like 10 times before I convinced myself
that this was correct. For those who want to know what this makes any
kind of sense (because, at first glance, it does not make any sense),
I'll explain it.

For starters, Java uses signed byte primitives but /unsigned/ char
primitives. For those coming from the C world, that may be confusing.
bytes are 8 (signed) bits and chars are 16 (unsigned) bits.

But Java doesn't have any defined arithmetic operations (including
bitwise) for anything smaller than an int (32 signed bytes), so the
above assignment is actually more like this:

byte b = 0xab; // e.g.
char c = (char)  (     ((int)b) & 0xff     )

So, first b is widened from 8 bits to 32 bits -- with a
sign-extension. That means that -1 is still -1, it's just represented
by a different bit pattern: 1111 1111 1111 1111 1111 1111 1111 1111
instead of 1111 1111.

Next, the bitwise && is performed, which zeros-out everything but the
bottom 8-bits (now we have .... .... 0000 0000 1111 1111). Then, that
value is cast to char which does practically nothing.

In the above example (-1), we get a final value of 255 for c, which is
exactly what you'd expect for an unsigned char whose signed value is -1.

I think the only surprise thing there is that Java widens all types to
32-bit signed int to perform these operations. Without that fact, the
above assignment doesn't make much sense. In C, that line of code
would do absolutely nothing at all.


Would a simpler way to say this not be that in Java, a char is a 16-bit integer whose value happens to be the corresponding character's Unicode codepoint ?

Of course his all takes us further away from the OP's original description of the issue, which said "The data being sent from the java class is encoded in a binary
buffer with the bytes in ISO8859_1."
Which basically doesn't make sense, unless the data in question is originallly 
text.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to