On Mon, 3 Jul 2023 17:18:56 +0700 Hairy Pixels via fpc-pascal <fpc-pascal@lists.freepascal.org> wrote:
>[...] > > First of all: Is it valid UTF-8 or do you have to check for broken > > or malicious sequences? > > If they give the parser broken files that's their problem they need > to fix? the user has control over the file so it's there > responsibility I think. Users responsibility? - I recommend to check for malicious codes. ;) > >> Right now I've just read the file into an AnsiString and indexing > >> assuming a fixed character size, which breaks of course if non-1 > >> byte characters exist > > > > Sounds like UTF8CodepointToUnicode in unit LazUTF8 could be useful: > > > > function UTF8CodepointToUnicode(p: PChar; out CodepointLen: > > integer): Cardinal; > > Not sure how this works. You need to advance by character so there > return value should be the byte location of the next character or > something like that. function ReadUTF8(p: PChar; ByteCount: PtrInt): PtrInt; // returns the number of codepoints var CodePointLen: longint; CodePoint: longword; begin Result:=0; while (ByteCount>0) do begin inc(Result); CodePoint:=UTF8CodepointToUnicode(p,CodePointLen); ...do something with the CodePoint... inc(p,CodePointLen); dec(ByteCount,CodePointLen); end; end; > >> I also need to know if I come across something like \u1F496 I need > >> to convert that to a unicode character. > > > > I guess you know how to convert a hex to a dword. > > Is there anything better than StrToInt? Good start. > I wouldn't be able to do it > myself though without that function. Hex to dword. That's easy enough for ChatGPT. > > function UnicodeToUTF8(CodePoint: cardinal): string; // UTF32 to > > UTF8 function UnicodeToUTF8(CodePoint: cardinal; Buf: PChar): > > integer; // UTF32 to UTF8 > > Ok I think this is basically what the other programmer submitted and > what ChatGPT tried to do. Yes, no need to reinvent the wheel. Mattias _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal