On Sat, February 20, 2010 01:15, JoshyFun wrote: > Hello Tomas, > > Friday, February 19, 2010, 11:55:39 PM, you wrote: > > TH> No, this can't work that way, otherwise output of any accented > TH> character in one of the Windows codepages would result in the same > TH> error. > > Tested the "wrong" return of stdout: > > code page UTF8 - 65001 en Windows > Length of string: 7 > camión -> Returned written: 6 > > Source code: > ------------------------------------- > uses classes,windows; > var > s: ansistring; > OutputStream: TStream; > Begin > Writeln('code page UTF8 - 65001 en Windows'); > OutputStream := THandleStream.Create(GetStdHandle(STD_OUTPUT_HANDLE)); > s:='cami'+#$C3+#$B3+'n'; //camión > writeln('Length of string: ',Length(s)); > writeln(' -> Returned written: ',OutputStream.write(s[1],Length(s))); > OutputStream.free; > End.
OK, this seems to be the problem. The underlying Win32 API (WriteFile) is requested to write 7 bytes to a file. However those 7 bytes correspond to only 6 characters in UTF-8, and the Win32 API (apparently) returns the number of written _characters_ rather than the number of written _bytes_. The Windows implementation of do_write (which is an internal wrapper around the platform specific API for writing to a file) currently assumes that the returned number is again number of bytes (equally to the provided parameter), which is OK for simple single byte codepages, but not OK for UTF-8, and it returns this number without any changes. The System routine for file I/O compares the number of bytes requested to be written to the number returned as actually written and they do not match, it is interpreted as an I/O error. Please, post a bug report about this. I guess that fixing it may require little bit more thinking. One simple way to fix it would be just changing the Windows implementation of do_write so that it only checks for an error value returned by WriteFile and if no error is indicated, the original length of buffer is returned regardless of the value returned by WriteFile. However, the information about the actually written _characters_ may be useful in certain cases, so I'm not sure if it isn't better to preserve it somehow and possibly extend implementation for other platforms to also get this value. Tomas _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal