theo escreveu:
@Luiz Americo

Your code
WideCompareText(UTF8Decode(Key), UTF8Decode(Str))
will work, but if speed matters, then it's rather bad.

Hi, i'm aware that the performance is bad although had not tested like you did, but at this point i'd like to stick with a solution that fpc provides natively since it's being used in a fpc component (TSqlite3Dataset).

In last revision i switched to the ansi version of the functions to save the conversion of the Key at each comparison. See http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/packages/fcl-db/src/sqlite/customsqliteds.pas?view=log#rev13431

Anyway is clear that functions to handle UTF8 and unicode in general is missing in fpc...
I've tried to make a faster function for UTF-8:

... maybe your function can be used as a base to future development. Add a new function to the widestringmanager?

Luiz
uses unicodeinfo, LCLProc;

function UTF8CompareText(s1, s2: UTF8String): Integer;
var u1, u2: Ucs4Char;
  u1l, u2l: longint;
  BytePos1, Len1, SLen1: integer;
  BytePos2, Len2, SLen2: integer;
begin
  Result := 0;
  BytePos1 := 1;
  BytePos2 := 1;
  SLen1 := System.Length(s1);
  SLen2 := System.Length(s2);

  if SLen1 <> SLen2 then  //Assuming lower/uppercase representations
have the same byte length
  begin
    if SLen1 > SLen2 then Result := 1 else Result := -1;
    exit;
  end;

  repeat
    u1 := UTF8CharacterToUnicode(@s1[BytePos1], Len1);
    inc(BytePos1, Len1);
    u2 := UTF8CharacterToUnicode(@s2[BytePos2], Len2);
    inc(BytePos2, Len2);
    if u1 <> u2 then
    begin
      {$IFDEF useunicodinfo}
      u1l := unicodeinfo.utf8proc_get_property(u1)^.lowercase_mapping;
      if u1l <> -1 then u1 := u1l;
      u2l := unicodeinfo.utf8proc_get_property(u2)^.lowercase_mapping;
      if u2l <> -1 then u2 := u2l;
      {$ELSE}
      u1 := UCS4Char(WideUpperCase(WideChar(u1))[1]);
      u2 := UCS4Char(WideUpperCase(WideChar(u2))[1]);
      {$ENDIF}
      if u1 <> u2 then
      begin
        Result := u1 - u2;
        exit;
      end;
    end;
  until (BytePos1 > SLen1) or (BytePos2 > SLen2)
end;


Some numbers for my system (Linux) where WideCompareText is the function
you use now, WideUppercase is the above function and unicodeinfo is
the above function with useunicodinfo defined. See here
http://wiki.lazarus.freepascal.org/Theodp


Comparing identical Strings of 322 Chars 10000 times
WideCompareText: 785ms
unicodeinfo: 75ms
WideUpperCase: 74ms

Comparing Strings of 322 Chars 10000 times where the 3rd char differs
WideCompareText: 268ms
unicodeinfo: 3ms
WideUpperCase: 8ms

Comparing identical Text of 322 Chars 10000 times where one Text is all
uppercase
WideCompareText: 810ms
unicodeinfo: 121ms
WideUpperCase: 1076ms

Regards Theo

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Reply via email to