On Tue, 3 Jul 2018, Marcos Douglas B. Santos wrote:

On Tue, Jul 3, 2018 at 7:50 AM, Michael Van Canneyt
<mich...@freepascal.org> wrote:

On Tue, 3 Jul 2018, Marco van de Voort wrote:
Trivial indeed, till you need more fine-grained control.
e.g. C needs to be an array of chars that mark word boundaries etc.

But I managed to solve the problem with regexps...

How?

I misunderstood how Split works. The regex is the 'word separator' in that
function.

The following correctly gives me all words. unit uregexp is the regexp unit
compiled for unicode.

Michael.

--------------

{$mode objfpc}
{$H+}
uses cwstring, sysutils, classes, uregexpr;

Var
  Split : TStringList;
  S : String;
  R : TRegexpr;
  E : TEncoding;

begin
  Split:=TStringList.Create;
  E:=TEncoding.UTF8;
  Split.LoadFromFile(ParamStr(1),E);
  S:=Split.Text;
  r := TRegExpr.Create;
  try
    r.spaceChars:=r.spaceChars+'|&@#"''(ยง^!{})-[]*%`=+/.;:,?';
    r.LineSeparators:=#10;
    r.Expression :='(\b[^\d\s]+\b)';
    if R.Exec(S) then
       REPEAT
       Writeln('Found: ',System.Copy (S, R.MatchPos [0], R.MatchLen[0]));
       UNTIL not R.ExecNext;
  finally
    r.Free;
  end;
end.
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to