On May 19, 12:07 am, py_genetic <[EMAIL PROTECTED]> wrote: > Hello, > > I'm importing large text files of data using csv. I would like to add > some more auto sensing abilities. I'm considing sampling the data > file and doing some fuzzy logic scoring on the attributes (colls in a > data base/ csv file, eg. height weight income etc.) to determine the > most efficient 'type' to convert the attribute coll into for further > processing and efficient storage... > > Example row from sampled file data: [ ['8','2.33', 'A', 'BB', 'hello > there' '100,000,000,000'], [next row...] ....] > > Aside from a missing attribute designator, we can assume that the same > type of data continues through a coll. For example, a string, int8, > int16, float etc. > > 1. What is the most efficient way in python to test weather a string > can be converted into a given numeric type, or left alone if its > really a string like 'A' or 'hello'? Speed is key? Any thoughts? > > 2. Is there anything out there already which deals with this issue? > > Thanks, > Conor
You might try investigating what can generate your data. With luck, it could turn out that the data generator is methodical and column data-types are consistent and easily determined by testing the first or second row. At worst, you will get to know how much you must check for human errors. - Paddy. -- http://mail.python.org/mailman/listinfo/python-list