Stefan Sperling wrote on Fri, Oct 21, 2011 at 13:27:28 +0200:
> On Fri, Oct 21, 2011 at 01:20:49PM +0200, Bert Huijben wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Daniel Shahaf [mailto:d...@daniel.shahaf.name]
> > > Sent: vrijdag 21 oktober 2011 13:13
> > > To: Tomáš Bihary
> > > Cc: dev@subversion.apache.org
> > > Subject: Re: problems with mimetype of and empty utf8 files in svn 1.7
> > > 
> > > What do you expect to happen?
> > > 
> > > As to special-casing svn_io_is_binary_data() to handle 0xEFBBBF
> > > correctly... we could do that, I suppose.
> > 
> > +1
> > AnkhSVN currently has its own code to remove the binary marking from these 
> > specific files.
> > 
> > Some 3th party Visual Studio features like to add empty files and then 
> > later fill them with the real data.
> > 
> > 
> > 
> > This leaves the case where you have just a few characters in a file where 
> > you have a BOM at the start, but for our users that case is far less common 
> > than this empty file case.
> > 
> > 
> >     Bert
> 
> Fine, here is my patch again, with a log message.
> Can someone run this through the windows test suite, please? Thanks.

Have you considered patchign svn_io_is_binary_data()?

(which, it appears, will be functionally equivalent to your current
patch)

> I don't expect any test failures from this to arise on *nix.
> Manual testing on BSD with files that contain just the UTF-8 BOM
> suggests that the patch works fine.
> 
> [[[
> Special-case empty UTF-8 files which have a UTF-8 BOM. Prevents such
> files from being considered binary by default.
> 
> * subversion/libsvn_subr/io.c
>   (svn_io_detect_mimetype2): If the block read from disk contains only
>     a UTF-8 BOM, don't return a binary mimetype but indicate to the caller
>     that it should be treated as text.
> 
> Reported by: Tomáš Bihary
> ]]]
> 
> Index: subversion/libsvn_subr/io.c
> ===================================================================
> --- subversion/libsvn_subr/io.c       (revision 1186983)
> +++ subversion/libsvn_subr/io.c       (working copy)
> @@ -2968,6 +2968,13 @@ svn_io_detect_mimetype2(const char **mimetype,
>    /* Now close the file.  No use keeping it open any more.  */
>    SVN_ERR(svn_io_file_close(fh, pool));
>  
> +  if (amt_read == 3 && block[0] == 0xEF && block[1] == 0xBB && block[2] == 
> 0xBF)
> +    {
> +      /* This is an empty UTF-8 file which only contains the UTF-8 BOM.
> +       * Treat it as plain text. */
> +      return SVN_NO_ERROR;
> +    }
> +  
>    if (svn_io_is_binary_data(block, amt_read))
>      *mimetype = generic_binary;
>  

Reply via email to