Stefan Sperling wrote on Fri, Oct 21, 2011 at 13:27:28 +0200: > On Fri, Oct 21, 2011 at 01:20:49PM +0200, Bert Huijben wrote: > > > > > > > -----Original Message----- > > > From: Daniel Shahaf [mailto:d...@daniel.shahaf.name] > > > Sent: vrijdag 21 oktober 2011 13:13 > > > To: Tomáš Bihary > > > Cc: dev@subversion.apache.org > > > Subject: Re: problems with mimetype of and empty utf8 files in svn 1.7 > > > > > > What do you expect to happen? > > > > > > As to special-casing svn_io_is_binary_data() to handle 0xEFBBBF > > > correctly... we could do that, I suppose. > > > > +1 > > AnkhSVN currently has its own code to remove the binary marking from these > > specific files. > > > > Some 3th party Visual Studio features like to add empty files and then > > later fill them with the real data. > > > > > > > > This leaves the case where you have just a few characters in a file where > > you have a BOM at the start, but for our users that case is far less common > > than this empty file case. > > > > > > Bert > > Fine, here is my patch again, with a log message. > Can someone run this through the windows test suite, please? Thanks.
Have you considered patchign svn_io_is_binary_data()? (which, it appears, will be functionally equivalent to your current patch) > I don't expect any test failures from this to arise on *nix. > Manual testing on BSD with files that contain just the UTF-8 BOM > suggests that the patch works fine. > > [[[ > Special-case empty UTF-8 files which have a UTF-8 BOM. Prevents such > files from being considered binary by default. > > * subversion/libsvn_subr/io.c > (svn_io_detect_mimetype2): If the block read from disk contains only > a UTF-8 BOM, don't return a binary mimetype but indicate to the caller > that it should be treated as text. > > Reported by: Tomáš Bihary > ]]] > > Index: subversion/libsvn_subr/io.c > =================================================================== > --- subversion/libsvn_subr/io.c (revision 1186983) > +++ subversion/libsvn_subr/io.c (working copy) > @@ -2968,6 +2968,13 @@ svn_io_detect_mimetype2(const char **mimetype, > /* Now close the file. No use keeping it open any more. */ > SVN_ERR(svn_io_file_close(fh, pool)); > > + if (amt_read == 3 && block[0] == 0xEF && block[1] == 0xBB && block[2] == > 0xBF) > + { > + /* This is an empty UTF-8 file which only contains the UTF-8 BOM. > + * Treat it as plain text. */ > + return SVN_NO_ERROR; > + } > + > if (svn_io_is_binary_data(block, amt_read)) > *mimetype = generic_binary; >