kpp9c wrote:
The input would like so:
[...]
Attached is a first cut at a parser that actually uses the raw content of your original email. You'll notice that the net effect is that the parser instance's items attribute contains the source ordered list of items with attributes for each of the various parts of the line. From this, it should be pretty easy to adjust the times and what not.
Cheers,
// m
#!/usr/bin/env python
"""usage: %prog """ raw = """I am kind of in a bit of a jam (okay a big jam) and i was hoping that someone here could give me a quick hand. I had a few pages of time calculations to do. So, i just started in on them typing them in my time calculator and writing them in by hand. Now i realize, that i really need a script to do this because: 1. It turns out there are hundreds of pages of this stuff. 2. I have to do something similar in again soon. 3. By doing it by hand i am introducing wonderful new errors! 4. It all has to be typed up anyway (which means weeks of work and even more typos!) The input would like so: Item_1 TAPE_1 1 00:23 8:23 Item_2 TAPE_1 2 8:23 9:41 Item_3 TAPE_1 3 9:41 10:41 Item_3 TAPE_1 4 10:47 11:19 Item_3 TAPE_1 5 11:21 11:55 Item_3 TAPE_1 6 11:58 12:10 Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound. Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound. Item_4 TAPE_1 9 24:33 Item_4 TAPE_1 10 25:48 Item_4 TAPE_1 11 29:48 Item_4 TAPE_1 12 31:46 Item_4 TAPE_1 13 34:17 Electronic sounds. Item_4 TAPE_1 14 35:21 Item_4 TAPE_1 15 36:06 Item_4 TAPE_1 16 37:01 37:38 These are analog tapes that were digitized (on to CD or a digital tape) that have now been exported as individual files that are meant to be part of an on-line audio archive. The timings refer to the time display on the CD or digital tape. The now all have to adjusted so that each item starts at 0.00 since they have all been edited out of their context and are now all individual items that start at 00:00. So Item_1 which was started at 00:23 on the tape and ended at 8:23 needs to have 23 seconds subtracted to it so that it says: Item_1 TAPE_1 1 00:00 08:00 Item_2 TAPE_1 2 08:23 09:41 would change to: Item_2 TAPE_1 2 00:00 01:18 etc. but as always you may notice a wrinkle.... some items have many times (here 6) indicated: Item_3 TAPE_1 3 9:41 10:41 Item_3 TAPE_1 4 10:47 11:19 Item_3 TAPE_1 5 11:21 11:55 Item_3 TAPE_1 6 11:58 12:10 Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound. Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound. This is all a single sound file and these separate times mark where there was a break, defect, or edit in the individual item. These have to be adjusted as well to show where these events would appear in the new sound file which now starts at 00:00. Item_3 TAPE_1 3 00:00 01:00 ---- Item_3 TAPE_1 4 01:00 01:38 ---- Item_3 TAPE_1 5 01:38 02:14 ---- Item_3 TAPE_1 6 02:14 02:29 ---- Item_3 TAPE_1 7 02:29 03:04 Defect in analog tape sound. Item_3 TAPE_1 8 03:04 14:39 Defect in analog tape sound. Further wrinkles: Some have start and end times indicated, some only start times. I suppose that the output would ideally have both.... some have comments and others don't ... and I need these comments echo-ed or since i probably need to make a database or table eventually non comments just have some place holder. I'd have a lot of similar type calculations to do... I was hoping and praying that some one here was feeling generous and show me the way and then, of course i could modify that to do other tasks... Usually i am happy to take the long road and all but i'll be honest, i am in a big jam here and this huge task was just dumped on me. I am frankly a little desperate for help on this and hoping someone is feeling up to spoon feeding me a clear modifiable example that works. Sorry..... cheers, kevin -- http://mail.python.org/mailman/listinfo/python-list """ import optparse import re pat = re.compile('\s+') class Item: def __init__(self, line): parts = pat.split(line) self.name, self.tape, self.number, self.start = parts[:4] if len(parts) == 5: self.end = parts[4] else: self.end = None if len(parts) > 5: self.comment = ' '.join(parts[5:]) else: self.comment = None class Parser: def __init__(self): self.items = [] def feed(self, line): item = Item(line) self.items.append(item) def parseCommandLine(usage, requiredArgCount, argv=None): """Parse the command line and return (options, args). Raise an error if there are insufficient positional arguments as specified by requiredArgCount. """ parser = optparse.OptionParser(usage) ## parser.add_option('-x', ## '--xxx', ## action='', ## default='', ## help='') options, args = parser.parse_args(argv) if len(args) < requiredArgCount: parser.error('Missing parameters.') return options, args def main(argv=None): usage = __doc__ requiredArgCount = 0 options, args = parseCommandLine(usage, requiredArgCount, argv) filename = args[0] parser = Parser() for line in raw.split('\n'): if not line.startswith('Item_'): continue parser.feed(line) if __name__ == '__main__': main()
-- http://mail.python.org/mailman/listinfo/python-list