"kpp9c" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > still working on it and also fixing the input data. I think for > simplicity and consistency's sake i will have *all* time values input > and output as hh:mm:ss maybe that would be easier.... but i have a few > thousand find and replaceeseseses to do now (yes i am doing them by > hand) > > grr... this is hard! >
Oh, I wasn't going to chime in on this thread, your data looked so well-formed that I wouldn't recommend pyparsing, but there is enough variability going on here, I thought I'd give it a try. Here's a pyparsing treatment of your problem. It will accommodate trailing comments or none, leading hours or none on timestamps, and missing end times, and normalizes all times back to the item start time. Most of your processing logic will end up going into the processVals() routine. I've put various examples of how to access the parsed tokens by field name, and some helper methods for converting to and from seconds and hh:mm:ss or mm:ss times. -- Paul from pyparsing import * data = """ Item_1 TAPE_1 1 00:23 8:23 Item_2 TAPE_1 2 8:23 9:41 Item_3 TAPE_1 3 9:41 10:41 Item_3 TAPE_1 4 10:47 11:19 Item_3 TAPE_1 5 11:21 11:55 Item_3 TAPE_1 6 11:58 12:10 Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound. Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound. Item_4 TAPE_1 9 24:33 Item_4 TAPE_1 10 25:48 Item_4 TAPE_1 11 29:48 Item_4 TAPE_1 12 31:46 Item_4 TAPE_1 13 34:17 Electronic sounds. Item_4 TAPE_1 14 35:21 Item_4 TAPE_1 15 36:06 Item_4 TAPE_1 16 37:01 01:37:38 """ def toSecs(tstr): fields = tstr.split(":") secs = int(fields[-1]) secs += int(fields[-2])*60 if len(fields)>2: secs += int(fields[-3])*60*60 return secs def secsToTime(secs): s = secs % 60 m = ((secs - s) / 60 ) % 60 h = (secs >= 3600 and (secs - s - m*60 ) / 3600 or 0) return "%02d:%02d:%02d" % (h,m,s) # globals for normalizing timestamps lastItem = "" itemStart = 0 # put logic here for processing various parse fields def processVals(s,l,t): global lastItem,itemStart print t.item,t.tape,t.recnum if not t.item == lastItem : lastItem = t.item itemStart = toSecs(t.start) startSecs = toSecs(t.start) print secsToTime(startSecs),"(%s)" % secsToTime(startSecs-itemStart) if t.end: endSecs = toSecs(t.end) print secsToTime(endSecs),"(%s)" % secsToTime(endSecs-itemStart) print endSecs-startSecs,"elapsed seconds" print secsToTime(endSecs-startSecs),"elapsed time" else: print "<no end time>" print t.comment print # define structure of a line of data - sorry about the clunkiness of the optional trailing fields integer = Word(nums) timestr = Combine(integer + ":" + integer + Optional(":" + integer)) dataline = ( Combine("Item_"+integer).setResultsName("item") + Combine("TAPE_"+integer).setResultsName("tape") + integer.setResultsName("recnum") + timestr.setResultsName("start") + Optional(~LineEnd() + timestr, default="").setResultsName("end") + Optional(~LineEnd() + empty + restOfLine,default="-").setResultsName("comment") ) # set up parse handler that will process the actual fields dataline.setParseAction(processVals) # now parse the little buggers OneOrMore(dataline).parseString(data) will print out: Item_1 TAPE_1 1 00:00:23 (00:00:00) 00:08:23 (00:08:00) 480 elapsed seconds 00:08:00 elapsed time - Item_2 TAPE_1 2 00:08:23 (00:00:00) 00:09:41 (00:01:18) 78 elapsed seconds 00:01:18 elapsed time - Item_3 TAPE_1 3 00:09:41 (00:00:00) 00:10:41 (00:01:00) 60 elapsed seconds 00:01:00 elapsed time - Item_3 TAPE_1 4 00:10:47 (00:01:06) 00:11:19 (00:01:38) 32 elapsed seconds 00:00:32 elapsed time - Item_3 TAPE_1 5 00:11:21 (00:01:40) 00:11:55 (00:02:14) 34 elapsed seconds 00:00:34 elapsed time - ... -- http://mail.python.org/mailman/listinfo/python-list