On Wed, Nov 18, 2009 at 11:01:51AM +0100, Daniel Näslund wrote: > On Wed, Nov 18, 2009 at 10:49:15AM +0100, Stefan Sperling wrote: > > On Wed, Nov 18, 2009 at 10:37:01AM +0100, Daniel Näslund wrote: > > > Index: subversion/libsvn_subr/stream.c > > > =================================================================== > > > --- subversion/libsvn_subr/stream.c (revision 881392) > > > +++ subversion/libsvn_subr/stream.c (arbetskopia) > > > @@ -1347,3 +1347,44 @@ > > > > > > return SVN_NO_ERROR; > > > } > > > + > > > +svn_error_t * > > > +svn_stream_detect_binary_mimetype(const char **mimetype, > > > + svn_stream_t *stream) > > > +{ > > > + static const char * const generic_binary = "application/octet-stream"; > > > + char block[1024]; > > > + apr_size_t amt_read = sizeof(block); > > > + > > > + /* Default return value is NULL. */ > > > + *mimetype = NULL; > > > + > > > + SVN_ERR(svn_stream_read(stream, block, &amt_read)); > > > + > > > + if (amt_read > 0) > > > + { > > > + apr_size_t i; > > > + apr_size_t binary_count = 0; > > > + > > > + for (i = 0; i < amt_read; i++) > > > + { > > > + if (block[i] == 0) > > > + { > > > + binary_count = amt_read; > > > + break; > > > + } > > > + if ((block[i] < 0x07) > > > + || ((block[i] > 0x0D) && (block[i] < 0x20)) > > > + || (block[i] > 0x7F)) > > > + { > > > + binary_count++; > > > + } > > > > Unless I'm mistaken the "greater 0x7F" check will trigger on *any* UTF-8 > > continuation byte. See http://tools.ietf.org/html/rfc3629#section-3 > > Yes, it will and this code is used for all the autoprops stuff! But > strange results has been hidden by the fact that the detection code > first checks for file endings. That's my guess atleast. A japanese text > would be considered binary! > > As I'm saying further down. I have only duplicated a part of > svn_io_detect_mimetype2() and intend to refactor this part into a helper > func in libsvn_subr.
Using libmagic is possible, at least from a legal point of view. It has a very simple 2-clause BSD-style license so we could link to it. Stefan