On 2007-05-20, John Machin <[EMAIL PROTECTED]> wrote: > On 19/05/2007 3:14 PM, Paddy wrote: >> On May 19, 12:07 am, py_genetic <[EMAIL PROTECTED]> wrote: >>> Hello, >>> >>> I'm importing large text files of data using csv. I would like to add >>> some more auto sensing abilities. I'm considing sampling the data >>> file and doing some fuzzy logic scoring on the attributes (colls in a >>> data base/ csv file, eg. height weight income etc.) to determine the >>> most efficient 'type' to convert the attribute coll into for further >>> processing and efficient storage... >>> >>> Example row from sampled file data: [ ['8','2.33', 'A', 'BB', 'hello >>> there' '100,000,000,000'], [next row...] ....] >>> >>> Aside from a missing attribute designator, we can assume that the same >>> type of data continues through a coll. For example, a string, int8, >>> int16, float etc. >>> >>> 1. What is the most efficient way in python to test weather a string >>> can be converted into a given numeric type, or left alone if its >>> really a string like 'A' or 'hello'? Speed is key? Any thoughts? >>> >>> 2. Is there anything out there already which deals with this issue? >>> >>> Thanks, >>> Conor >> >> You might try investigating what can generate your data. With luck, >> it could turn out that the data generator is methodical and column >> data-types are consistent and easily determined by testing the >> first or second row. At worst, you will get to know how much you >> must check for human errors. >> > > Here you go, Paddy, the following has been generated very methodically; > what data type is the first column? What is the value in the first > column of the 6th row likely to be? > > "$39,082.00","$123,456.78" > "$39,113.00","$124,218.10" > "$39,141.00","$124,973.76" > "$39,172.00","$125,806.92" > "$39,202.00","$126,593.21" > > N.B. I've kindly given you five lines instead of one or two :-)
My experience with Excel-related mistakes leads me to think that column one contains dates that got somehow misformatted on export. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list