-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 André,
On 2/8/16 6:25 PM, André Warnier (tomcat) wrote: > On 08.02.2016 23:31, Christopher Schultz wrote: >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> >> All, >> >> On 2/8/16 3:43 PM, Mark Thomas wrote: >>> On 08/02/2016 18:41, Jason Ricles wrote: >>>> I have an application that sends binary websocket messages >>>> between a class and the web application using a websocket >>>> server written in java. >>>> >>>> The data being sent from the java class is encoded in a >>>> binary buffer with the bytes in ISO8859_1. However, when I >>>> receive the bytes on the websocket server and the web >>>> application end they are junk (such as -121, -116, etc.) and >>>> not encoded the correct way that they need to be. >>> >>> The bytes are transmitted as unsigned on the wire (as required >>> by the WebSocket spec). Java handles them as signed. You need >>> to convert them. Something like (untested): >>> >>> char c = b & 0xFF; >> >> I had to read this something like 10 times before I convinced >> myself that this was correct. For those who want to know what >> this makes any kind of sense (because, at first glance, it does >> not make any sense), I'll explain it. >> >> For starters, Java uses signed byte primitives but /unsigned/ >> char primitives. For those coming from the C world, that may be >> confusing. bytes are 8 (signed) bits and chars are 16 (unsigned) >> bits. >> >> But Java doesn't have any defined arithmetic operations >> (including bitwise) for anything smaller than an int (32 signed >> bytes), so the above assignment is actually more like this: >> >> byte b = 0xab; // e.g. char c = (char) ( ((int)b) & 0xff >> ) >> >> So, first b is widened from 8 bits to 32 bits -- with a >> sign-extension. That means that -1 is still -1, it's just >> represented by a different bit pattern: 1111 1111 1111 1111 1111 >> 1111 1111 1111 instead of 1111 1111. >> >> Next, the bitwise && is performed, which zeros-out everything but >> the bottom 8-bits (now we have .... .... 0000 0000 1111 1111). >> Then, that value is cast to char which does practically nothing. >> >> In the above example (-1), we get a final value of 255 for c, >> which is exactly what you'd expect for an unsigned char whose >> signed value is -1. >> >> I think the only surprise thing there is that Java widens all >> types to 32-bit signed int to perform these operations. Without >> that fact, the above assignment doesn't make much sense. In C, >> that line of code would do absolutely nothing at all. >> > > Would a simpler way to say this not be that in Java, a char is a > 16-bit integer whose value happens to be the corresponding > character's Unicode codepoint ? If you want to be pedantic (and I know you do!), a Java character is a subset of Unicode codepoints. Unicode specifies more than 2^16 codepoints (or, at least, the range exceeds what 2^16 addresses covers). If you want to use actual Unicode codepoints, you need to use Java int -- which is why String.codePointAt returns int and not char. > Of course his all takes us further away from the OP's original > description of the issue, which said "The data being sent from the > java class is encoded in a binary buffer with the bytes in > ISO8859_1." Which basically doesn't make sense, unless the data in > question is originallly text. Of course it makes no sense at all. Binary is binary and character encoding is a property of text. Perhaps what he meant was that it wasn't XML or some fancy Web 2.0 thingy. But of course, he's using Websocket which is, by definition, Web 2.0. Welcome to the new binary! Text-encoding of binary data across a text-based channel. Or something like that. - -chris -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAla58lQACgkQ9CaO5/Lv0PC+lACgo1yaNVCR0irOrk5hUSw3iury +BIAoLQElOEZylktC5u8ZIo5GaurP855 =a2zc -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org