Lorn Davies wrote: > Hi there, I'm a Python newbie hoping for some direction in working with > text files that range from 100MB to 1G in size. Basically certain rows, > sorted by the first (primary) field maybe second (date), need to be > copied and written to their own file, and some string manipulations > need to happen as well. An example of the current format: > > XYZ,04JAN1993,9:30:27,28.87,7600,40,0,Z,N > XYZ,04JAN1993,9:30:28,28.87,1600,40,0,Z,N > | > | followed by like a million rows similar to the above, with > | incrementing date and time, and then on to next primary field > | > ABC,04JAN1993,9:30:27,28.875,7600,40,0,Z,N > | > | etc., there are usually 10-20 of the first field per file > | so there's a lot of repetition going on > | > > The export would ideally look like this where the first field would be > written as the name of the file (XYZ.txt): > > 19930104, 93027, 2887, 7600, 40, 0, Z, N > > Pretty ambitious for a newbie? I really hope not. I've been looking at > simpleParse, but it's a bit intense at first glance... not sure where > to start, or even if I need to go that route. Any help from you guys in > what direction to go or how to approach this would be hugely > appreciated. > > Best regards, > Lorn
You could use the csv module. Here's the example from the manual with your sample data in a file named simple.csv: import csv reader = csv.reader(file("some.csv")) for row in reader: print row """ ['XYZ', '04JAN1993', '9:30:27', '28.87', '7600', '40', '0', 'Z', 'N '] ['XYZ', '04JAN1993', '9:30:28', '28.87', '1600', '40', '0', 'Z', 'N '] ['ABC', '04JAN1993', '9:30:27', '28.875', '7600', '40', '0', 'Z', 'N '] """ The csv module while bring each line in as a list of strings. Of course, you want to process each line before printing it. And you don't just want to print it, you want to write it to a file. So after reading the first line, open a file for writing with the first field (row[0]) as the file name. Then you want to process fields row[1], row[2] and row[3] to get them in the right format and then write all the row fields except row[0] to the file that's open for writing. On every subsequent line you must check to see if row[0] has changed, so you'll have to store row[0] in a variable. If it's changed, close the file you've been writing to and open a new file with the new row[0]. Then continue processing lines as before. It will only be this simple if you can guarantee that the original file is actually sorted by the first field. -- http://mail.python.org/mailman/listinfo/python-list