Andreas Dorn wrote on Fri, 25 Sep 2015:

In the discussion about resourcestrings I read that the RTL now uses codepage-aware strings for FileIO.
So I wonder what kind of codepages do you use for FileIO?

On Windows: UTF-16.

The Windows-documentation calls Filenames "opaque sequence of WCHARs".
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx

So e.g. converting a Filename from the Windows-API to UTF-8 can be lossy.
Does the new FPC-FileApi work correctly if a Filename contains invalid UTF-16 sequences? 

If you use the RTL file APIs with unicodestrings on Windows, then no conversions should occur because
a) we use the UTF-16 Windows APIs
b) all file name helpers are available both with unicodestring and rawbytestring parameters, so the unicodestring ones should be used.

If you use the RTL file APIs with an ansistring variant, then which code page is used is described at http://wiki.freepascal.org/FPC_Unicode_support#Code_page_settings

Maybe we should add support for detecting invalid UTF-16 sequences in returned file names from Windows APIs, and if there are any ask for and return the "short/safe name" instead (file~1.txt and the like). For data that you pass in yourself, there is no problem (either you pass in UTF-16 and it will be passed on unmodified, or you use another code page and then it's your responsibility if it contains invalid data -- which pretty much only can happen with UTF-8, and possibly some single byte code pages that have undefined bytes, if there are any).

Assigning a codepage to something that basically is just some raw sequence of bytes from an
external source sounds dangerous to me.

It is.


Jonas

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to