On Sun, 13 Aug 2017 14:18:23 +0300, Juha Manninen via Lazarus <lazarus@lists.lazarus-ide.org> wrote:
>On Sun, Aug 13, 2017 at 1:21 AM, Bo Berglund via Lazarus ><lazarus@lists.lazarus-ide.org> wrote: >> I recently had a problem with an application that was converted from >> old string type to AnsiString and seemingly worked in the new Unicode >> environment. > >What was the old string type? Note: The programs were started back in around 2000 using Delphi 7... We used "string" as the container for processing serial data to/from CNC machine tool controllers amongst others. This was triggered really by the serial components, which mostly transferred char(acters) and had methods for sending and receiving strings, even though we usually used char. >> However, we received reports that it had failed in some Asian >> countries (Korea, China, Thailand) and upon checking it turned out >> that the data inside a string used as buffer was changed because of >> locale differences.... > >Unicode was designed to solve exactly the problems caused by locale >differences. >Why don't you use it? Again, these are old existing programs and we are not doing this anymore for new programs. However, there is one problem still becauyse there is an interface point to the hardware, in the form of serial components, which still handle chars... And chars are nowadays Unicode chars, i.e. not mapping to bytes sent by RS232... And our data are NOT text, they are binary streams of bytes. >> After switching out the affected variable declarations from AnsiString >> to RawByteString the application seemingly started to work again also >> on these locations. >> ... >> And after this I have spent some time to totally rework the use of >> strings as buffers to instead use TBytes. Lots of work but >> guaranteed to not sneak in unexpected conversions. > >RawByteString is for text which encoding is not meant to be converted. >It has its special use cases. My first attempt at "fixing" the problem in Asian locales was to use RawByteString so as to inhibit conversions. Still with these as comm buffers... It seemed to work out, but to be safer I have reworked one application to replace with TBytes everywhere comm data are handled. >TBytes is usually for binary data. Exactly, and this is why I made the comment that to be on the safe side dealing with RS232 the buffers should be TBytes (or some other similar construct). >Did I understand right: you use TBytes to hold strings having Windows >codepage encoding? No, definitively not. At the time we were not aware of any encoding at all. To us a string was just a handy container for the serial data like a dynamic array of byte with some useful functions available for searching and things like that. I think we were not alone... >Again: Why not Unicode? Then you could throw away your hacks. The application itself is Unicode now but we had to run circles around the RS232 comm part. When converting to Unicode we first set the comm related strings to be AnsiString... PS: We never programmed the serial interface directly, we always used commercial RS232 components and they all dealt with char and string... DS -- Bo Berglund Developer in Sweden -- _______________________________________________ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus