Alternatively, count unigrams in the first 1000 characters and get the
euclidean distance to a sample from e.g. an english text, a french
text, a chinese text, etc.
- Lucas
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Hello Evan,
Monday, February 23, 2004, 8:57:43 PM, you wrote:
>> It would be wise to check for characters from 0 to 31, if they appear
>> then it's almost certainly (but not guaranteed) binary.
EN> Assuming that's decimal, you're including 0x09 0x0a and 0x0d which are,
EN> respectively, tab, lin
2 matches
Mail list logo