RE: ESME - Opensmppbox - Kannel - SMSC Encoding problem

Yegor Ivaschenko Thu, 21 May 2015 03:07:57 -0700

Dear, Aurel.

Thank you for your quick reply.


I don't really sure want I should do next. Answer to you directly or maybe only 
to users@kannel or both.

Teach me the procedure.


Anyway, I did a lot of testing of my ESME behaviour.

It has web interface, message input box and status bar which indicates how many 
characters entered and upper limit for current message. (i.e. < 5 / 140 >)

As I know,


UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable 
of encoding all 1,112,064 possible characters in Unicode. The encoding is 
variable-length, as code points are encoded with one or two 16-bit code units.

UTF-16 developed from an earlier fixed-width 16-bit encoding known as UCS-2 
(for 2-byte Universal Character Set) once it became clear that a fixed-width 
2-byte encoding could not encode enough characters to be truly universal. 
(http://en.wikipedia.org/wiki/UTF-16).


So UCS-2 is always fixed 16-bit encoding.

Also UCS-2 is always BE. This is what wiki says:


UCS-2 encoding is defined to be big-endian only. In practice most software 
defaults to little-endian,[citation 
needed<http://en.wikipedia.org/wiki/Wikipedia:Citation_needed>] and handles a 
leading BOM to define the byte order just as in UTF-16. Although the similar 
designations UCS-2BE and UCS-2LE imitate the UTF-16 labels, they do not 
represent official encoding schemes. (http://en.wikipedia.org/wiki/UTF-16)


Well, when I enter in my ESME something like this: "asdfAEaaee " (which are 
actually characters from ISO8859-1 chars table) it shows in status bar (9/140). 
When I send it to opensmppbox, with wireshark analysis I can see, that it uses 
1 byte per character encoding.

So It can't be UCS-2 or UTF-16. It's definitely latin-1 (or some other of 
ISO-8859 family).


When I add some non-latin-1 character (like russian "ф" or japanese "テ" (te)) 
"asdfAEaaee фテ"status bar of ESME's message input box changes upper limit to 
70. In wireshark dump I can see, that message send to opensmppbox has leading 
"FF FE" character which leads to LE encoding. Each character encoded with 2 
bytes.

So again, it's not UCS-2 (which, according to standard have to be Big Endian 
w/o BOM). It's either UTF-16 LE or non official UCS-2.


So my ESME actually uses two encoding schemes depending on message itself. If 
needed I can provide dumps and logs.


So one more time, SMSC need only pure UCS-2 (or UTF-16 BE w/o BOM). My ESME 
sends messages using ISO-8859-(1) and UTF-16 LE.

My question is : Is there any way to cast all messages going from ESME to 
UTF-16 BE encoding by means of Kannel or some additional software in pair with 
Kannel?



By the way, can you tell me were is that ticket considering this problem.

Maybe I can help by providing further information like wireshark dump, kannel 
logs and configs etc.



Best regards,

Ivashchenko Yegor

________________________________
差出人: Aurel Branzeanu <branzeanu.au...@gmail.com>
送信日時: 2015年5月21日 0:54
宛先: Yegor Ivaschenko
CC: users@kannel.org
件名: Re: ESME - Opensmppbox - Kannel - SMSC Encoding problem

Hello, Yegor!

On Wed, May 20, 2015 at 4:30 PM, Yegor Ivaschenko 
<yegor_ivasche...@exigenebit.com<mailto:yegor_ivasche...@exigenebit.com>> wrote:
My ESME, based on characters used in message use 2 encoding type.

I think it does not use 2 encodings, but just UCS-2, whose latin part matches 
ISO8859-1

There is, however, a very nasty bug when receiving messages to opensmppbox in 
UCS-2 (data_coding = 8) and using two alphabets, say, english and russian - I 
will submit a bug-report right now.

--
Sincerely yours,

Aurel Branzeanu,

mailto: branzeanu.au...@gmail.com<mailto:branzeanu.au...@gmail.com>
Skype: tvorogov
GSM Orange:  +373 6 940-7700
GSM Moldcell: +373 7 940-7700

RE: ESME - Opensmppbox - Kannel - SMSC Encoding problem

Reply via email to