On 09.02.2016 15:06, Christopher Schultz wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

André,

On 2/8/16 6:25 PM, André Warnier (tomcat) wrote:
On 08.02.2016 23:31, Christopher Schultz wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

All,

On 2/8/16 3:43 PM, Mark Thomas wrote:
On 08/02/2016 18:41, Jason Ricles wrote:
I have an application that sends binary websocket messages
between a class and the web application using a websocket
server written in java.

The data being sent from the java class is encoded in a
binary buffer with the bytes in ISO8859_1. However, when I
receive the bytes on the websocket server and the web
application end they are junk (such as -121, -116, etc.) and
not encoded the correct way that they need to be.

The bytes are transmitted as unsigned on the wire (as required
by the WebSocket spec). Java handles them as signed. You need
to convert them. Something like (untested):

char c = b & 0xFF;

I had to read this something like 10 times before I convinced
myself that this was correct. For those who want to know what
this makes any kind of sense (because, at first glance, it does
not make any sense), I'll explain it.

For starters, Java uses signed byte primitives but /unsigned/
char primitives. For those coming from the C world, that may be
confusing. bytes are 8 (signed) bits and chars are 16 (unsigned)
bits.

But Java doesn't have any defined arithmetic operations
(including bitwise) for anything smaller than an int (32 signed
bytes), so the above assignment is actually more like this:

byte b = 0xab; // e.g. char c = (char)  (     ((int)b) & 0xff
)

So, first b is widened from 8 bits to 32 bits -- with a
sign-extension. That means that -1 is still -1, it's just
represented by a different bit pattern: 1111 1111 1111 1111 1111
1111 1111 1111 instead of 1111 1111.

Next, the bitwise && is performed, which zeros-out everything but
the bottom 8-bits (now we have .... .... 0000 0000 1111 1111).
Then, that value is cast to char which does practically nothing.

In the above example (-1), we get a final value of 255 for c,
which is exactly what you'd expect for an unsigned char whose
signed value is -1.

I think the only surprise thing there is that Java widens all
types to 32-bit signed int to perform these operations. Without
that fact, the above assignment doesn't make much sense. In C,
that line of code would do absolutely nothing at all.


Would a simpler way to say this not be that in Java, a char is a
16-bit integer whose value happens to be the corresponding
character's Unicode codepoint ?

If you want to be pedantic (and I know you do!),

this time I hesitated..

 a Java character is a
subset of Unicode codepoints. Unicode specifies more than 2^16
codepoints (or, at least, the range exceeds what 2^16 addresses
covers). If you want to use actual Unicode codepoints, you need to use
Java int -- which is why String.codePointAt returns int and not char.


Well, I was planning to add a proviso about Unicode characters that were not part of the Basic Multilingual Plane (and thus with Codepoints above 2exp16-1), but I figured that the matter was already confusing enough.
I found a old but good article about this topic :
http://www.javaworld.com/article/2076571/java-se/an-in-depth-look-at-java-s-character-type.html
And this must be the bible :
https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html

Of course his all takes us further away from the OP's original
description of the issue, which said "The data being sent from the
java class is encoded in a binary buffer with the bytes in
ISO8859_1." Which basically doesn't make sense, unless the data in
question is originallly text.

Of course it makes no sense at all. Binary is binary and character
encoding is a property of text. Perhaps what he meant was that it
wasn't XML or some fancy Web 2.0 thingy. But of course, he's using
Websocket which is, by definition, Web 2.0. Welcome to the new binary!
Text-encoding of binary data across a text-based channel. Or something
like that.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAla58lQACgkQ9CaO5/Lv0PC+lACgo1yaNVCR0irOrk5hUSw3iury
+BIAoLQElOEZylktC5u8ZIo5GaurP855
=a2zc
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to