[EMAIL PROTECTED] wrote: > Lorn Davies wrote: > > Hi there, I'm a Python newbie hoping for some direction in working > with > > text files that range from 100MB to 1G in size. Basically certain > rows, > > sorted by the first (primary) field maybe second (date), need to be > > copied and written to their own file, and some string manipulations > > need to happen as well. An example of the current format: > > > > XYZ,04JAN1993,9:30:27,28.87,7600,40,0,Z,N > > XYZ,04JAN1993,9:30:28,28.87,1600,40,0,Z,N > > | > > | followed by like a million rows similar to the above, with > > | incrementing date and time, and then on to next primary field > > | > > ABC,04JAN1993,9:30:27,28.875,7600,40,0,Z,N > > | > > | etc., there are usually 10-20 of the first field per file > > | so there's a lot of repetition going on > > | > > > > The export would ideally look like this where the first field would > be > > written as the name of the file (XYZ.txt): > > > > 19930104, 93027, 2887, 7600, 40, 0, Z, N > > > > Pretty ambitious for a newbie? I really hope not. I've been looking > at > > simpleParse, but it's a bit intense at first glance... not sure where > > to start, or even if I need to go that route. Any help from you guys > in > > what direction to go or how to approach this would be hugely > > appreciated. > > > > Best regards, > > Lorn > > You could use the csv module. > > Here's the example from the manual with your sample data in a file > named simple.csv:
Obviously, I meant "some.csv". Make sure the name in the program matches the file you want to process, or pass the input file name to the program as an argument. > > import csv > reader = csv.reader(file("some.csv")) > for row in reader: > print row > > """ > ['XYZ', '04JAN1993', '9:30:27', '28.87', '7600', '40', '0', 'Z', 'N '] > ['XYZ', '04JAN1993', '9:30:28', '28.87', '1600', '40', '0', 'Z', 'N '] > ['ABC', '04JAN1993', '9:30:27', '28.875', '7600', '40', '0', 'Z', 'N '] > """ > > The csv module while bring each line in as a list of strings. > Of course, you want to process each line before printing it. > And you don't just want to print it, you want to write it to a file. > > So after reading the first line, open a file for writing with the > first field (row[0]) as the file name. Then you want to process > fields row[1], row[2] and row[3] to get them in the right format > and then write all the row fields except row[0] to the file that's > open for writing. > > On every subsequent line you must check to see if row[0] has changed, > so you'll have to store row[0] in a variable. If it's changed, close > the file you've been writing to and open a new file with the new > row[0]. Then continue processing lines as before. > > It will only be this simple if you can guarantee that the original > file is actually sorted by the first field. -- http://mail.python.org/mailman/listinfo/python-list