there's another problem with file -m that i've been bitten by before: it ignores any stuff after the first 6000 bytes.
so if you've got a mostly-ascii file with some utf-8 characters 8K in, then it won't be picked up. i think file -m should read the whole file, but that's just IMHO.