On Jun 20, 6:01 am, TYR <[EMAIL PROTECTED]> wrote: > OK, this ought to be simple. I'm parsing a large text file (originally > a database dump) in order to process the contents back into a SQLite3 > database. The data looks like this: > > 'AAA','PF',-17.416666666667,-145.5,'Anaa, French Polynesia','Pacific/ > Tahiti','Anaa';'AAB','AU',-26.75,141,'Arrabury, Queensland, > Australia','?','?';'AAC','EG',31.133333333333,33.8,'Al Arish, > Egypt','Africa/Cairo','El Arish International';'AAE','DZ', > 36.833333333333,8,'Annaba','Africa/Algiers','Rabah Bitat'; > > which goes on for another 308 lines. As keen and agile minds will no > doubt spot, the rows are separated by a ; so it should be simple to > parse it using a regex. So, I establish a db connection and cursor, > create the table, and open the source file.
Using pyparsing, you can skip all that "what happens if there is a semicolon or comma inside a quoted string?" noise, and get the data in a trice. If you add results names (as I've done in the example), then loading each record into your db should be equally simple. Here is a pyparsing extractor for you. The parse actions already do the conversions to floats, and stripping off of quotation marks. -- Paul data = """ 'AAA','PF',-17.416666666667,-145.5,'Anaa, French Polynesia','Pacific/ Tahiti','Anaa';'AAB','AU',-26.75,141,'Arrabury, Queensland, Australia','?','?';'AAC','EG',31.133333333333,33.8,'Al Arish, Egypt','Africa/Cairo','El Arish International';'AAE','DZ', 36.833333333333,8,'Annaba','Africa/Algiers','Rabah Bitat'; """.splitlines() data = "".join(data) from pyparsing import * num = Regex(r'-?\d+(\.\d+)?') num.setParseAction(lambda t: float(t[0])) qs = sglQuotedString.setParseAction(removeQuotes) CMA = Suppress(',') SEMI = Suppress(';') dataRow = qs("field1") + CMA + qs("field2") + CMA + \ num("long") + CMA + num("lat") + CMA + qs("city") + CMA + \ qs("tz") + CMA + qs("field7") + SEMI for dr in dataRow.searchString(data): print dr.dump() print dr.city,dr.long,dr.lat Prints: ['AAA', 'PF', -17.416666666666998, -145.5, 'Anaa, French Polynesia', 'Pacific/ Tahiti', 'Anaa'] - city: Anaa, French Polynesia - field1: AAA - field2: PF - field7: Anaa - lat: -145.5 - long: -17.4166666667 - tz: Pacific/ Tahiti Anaa, French Polynesia -17.4166666667 -145.5 ['AAB', 'AU', -26.75, 141.0, 'Arrabury, Queensland, Australia', '?', '?'] - city: Arrabury, Queensland, Australia - field1: AAB - field2: AU - field7: ? - lat: 141.0 - long: -26.75 - tz: ? Arrabury, Queensland, Australia -26.75 141.0 ['AAC', 'EG', 31.133333333332999, 33.799999999999997, 'Al Arish, Egypt', 'Africa/Cairo', 'El Arish International'] - city: Al Arish, Egypt - field1: AAC - field2: EG - field7: El Arish International - lat: 33.8 - long: 31.1333333333 - tz: Africa/Cairo Al Arish, Egypt 31.1333333333 33.8 ['AAE', 'DZ', 36.833333333333002, 8.0, 'Annaba', 'Africa/Algiers', 'Rabah Bitat'] - city: Annaba - field1: AAE - field2: DZ - field7: Rabah Bitat - lat: 8.0 - long: 36.8333333333 - tz: Africa/Algiers Annaba 36.8333333333 8.0 -- http://mail.python.org/mailman/listinfo/python-list