hi. thanks for the reply.
tried what you suggested. what I see now, is that I print out the lines, but not the regex data at all. my initial try, gave me the line, and then the next items , followed by the next line, etc... what I then tried, was to do a capture/findall of the regex, and combine the outputs in separate loops, which will be ugly but will work.... ff= "byu2.dat" #fff= "sdsu2.dat" with open(ff,"r") as myfile: s=myfile.read() s=s.replace(" ", "") #with open(fff,"w") as myfile2: # myfile2.write(s) #<br>#45 / 58#0# #<br>#45 / 58#0# #dat1=re.compile("<br>#(\d+) / (\d+)#(\d+)#").search(s).findall() dat1=re.findall("<br>#(\d+) / (\d+)#(\d+)#",s) dat=re.compile("<br>#(\d+) / (\d+)#(\d+)#").split(s) dat2 = re.compile(r"<br>#\d+ / \d+#\d+#").split(s) #dat=re.split('("<br>#(\d+) / (\d+)#(\d+)#")',s) #dat=re.compile("<br>#(\d+)").split(s) for m in dat: if m: print "m = "+m #sys.exit() print "dat1" print dat1 print len(dat1) print "dat2a" #sys.exit() # for m in dat1: # if m: # print "m = "+m # # #sys.exit() for m in dat2: if m: print "m = "+m #sys.exit() sys.exit() return the test data is pasted to -->>> http://bpaste.net/show/kYzBUIfhc5023phOVmcu/ thanks !! On Thu, Nov 7, 2013 at 1:13 PM, MRAB <pyt...@mrabarnett.plus.com> wrote: > On 07/11/2013 17:45, bruce wrote: >> >> update... >> >> dat=re.compile("<br>#(\d+) / (\d+)#(\d+)#").split(s) >> >> almost works.. >> >> except i get >> m = 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett, >> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL >> m = 45 >> m = 58 >> m = 0 >> m = 10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett, >> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL >> m = 9 >> m = 58 >> m = 0 >> >> and what i want is: >> m = 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett, >> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL 45 / 58,0 >> m = 10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett, >> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL 9 / 58,0 >> >> >> so i'd have the results of the "compile/regex process" to be added to >> the split lines >> >> thoughts/comments?? >> >> thanks >> > The split method also returns what's matched in any capture groups, > i.e. "(\d+)". Try omitting the parentheses: > > dat = re.compile(r"<br>#\d+ / \d+#\d+#").split(s) > > You should also be using raw string literals as above (r"..."). It > doesn't matter in this instance, but it might in others. > >> >> >> On Thu, Nov 7, 2013 at 12:15 PM, bruce <badoug...@gmail.com> wrote: >>> >>> hi. >>> >>> got a test file with the sample content listed below: >>> >>> the content is one long string, and needs to be split into separate lines >>> >>> I'm thinking the pattern to split on should be a kind of regex like:: >>> <br>#45 / 58#0# >>> or >>> <br>#9 / 58#0 >>> but i have no idea how to make this happen!! >>> >>> if i read the content into a buf -> s >>> >>> import re >>> dat = re.compile("what goes here??").split(s) >>> >>> --i'm not sure what goes in the compile() to get the process to work.. >>> >>> thoughts/comments would be helpful. >>> >>> thanks >>> >>> >>> test dat:: >>> 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett, >>> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL <br>#45 / >>> 58#0#10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett, >>> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL <br>#9 / >>> 58#0#10178#000#C S#S#124##001##DAY#Computer Systems#Roper, >>> Paul#3#MWF<br>#11:00am<br>#11:50am<br>#1170 TMCB <br>#41 / >>> 145#0#10178#000#C S#S#124##002##DAY#Computer Systems#Roper, >>> Paul#3#MWF<br>#2:00pm<br>#2:50pm<br>#1170 TMCB <br>#40 / >>> 120#0#01489#002#C S#S#142##001##DAY#Intro to Computer >>> Programming#Burton, Robert <div class='instructors'>Seppi, Kevin<br >>> /></div><span >> >> > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list