On Mon, 3 Jul 2023 11:58:33 +0700 Hairy Pixels via fpc-pascal <fpc-pascal@lists.freepascal.org> wrote:
> > On Jul 3, 2023, at 11:43 AM, Mattias Gaertner via fpc-pascal > > <fpc-pascal@lists.freepascal.org> wrote: > > > > There is a header byte. > > > > It depends, if you want to check for invalid UTF-8 sequences. > > > > From LazUTF8: > > > > function UTF8CodepointSizeFast(p: PChar): integer; > > begin > > case p^ of > > #0..#191 : Result := 1; > > #192..#223 : Result := 2; > > #224..#239 : Result := 3; > > #240..#247 : Result := 4; > > else Result := 1; // An optimization + prevents compiler warning > > about uninitialized Result. end; > > end; > > This is a header for the file? No, the header of a codepoint to figure out the length. > Does that mean the file itself must > have uniform character sizes? No. > I though the idea was to read the file > one byte at a time but I don't understand how you would know if a 1 > byte character (like ascii) was part of a 4 byte character or not. ASCII is #0..#127, which is the same character in UTF-8. Mattias _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal