John Machin wrote: > The approach that I've adopted is to test the values in a column for all > types, and choose the non-text type that has the highest success rate > (provided the rate is greater than some threshold e.g. 90%, otherwise > it's text). > > For large files, taking a 1/N sample can save a lot of time with little > chance of misdiagnosis.
Why stop there? You could lower the minimum 1/N by straightforward application of Bayesian statistics, using results from previous tables as priors. James -- http://mail.python.org/mailman/listinfo/python-list