[fpc-pascal] UnicodeString and Length() function

Graeme Geldenhuys Fri, 25 Mar 2016 12:22:12 -0700

I never really used the UnicodeString (or WideString for that matter) -
I've always used AnsiString with UTF-8 content. I also have my own UTF8
functions Copy(), Length() etc.


Looking at UnicodeString - with FPC 2.6.4 I seem a bit confused. :-/

Take the following code:

============================
{$mode objfpc}{$h+}
{--- $codepage utf8}  // disabled

var
  S: UTF8String; // for FPC 2.6.4 this is an alias for AnsiString
  U: UnicodeString;
begin
  S := 'Tiburón';
  WriteLn(Length(S));
  U := 'Tiburón';
  WriteLn(Length(U));
============================

On my 64-bit FreeBSD system that outputs the following:

==========
10
8
==========

Length() returns the number of bytes, correct?

So why isn't the result 8 and 14?  The letter o with acute is 2-bytes in
UTF8 ($C3 & $B4). For Unicode (UTF-16), where a "character" is a word
size (2-bytes), thus 2 bytes * 7 characters = 14 bytes. But Length()
returns totally different values to what I expected.

Enabling the {$codepage utf8} made no difference to the results shown above.

Could anybody explain this please?

Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

[fpc-pascal] UnicodeString and Length() function

Reply via email to