[EMAIL PROTECTED] wrote: > Hi all, > > I've spent all morning trying to work this one out: > > I've got the following string: > > <td>04/01/2006</td><td>Wednesday</td><td> </td><td>09:14</td><td>12:44</td><td>12:50</td><td>17:58</td><td> </td><td> </td><td> </td><td> </td><td>08:14</td> > > from which I'm attempting to extract the date, and the five times from > into a list. Only the very last time is guaranteed to be there so it > should also work for a line like: > > <td>03/01/2006</td><td>Tuesday</td><td>Annual_Holiday</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td>08:00</td> > > My Python regular expression to match that is currently: > > digs = re.compile( > r'<td>(\d{2}\/\d{2}\/\d{4})</td>.*?(?:<td>(\d+\:\d+)</td>).*$' ) > > which first extracts the date into group 1 > then matches the tags between the date and the first instance of a time > into group 2 > then matches the first instance of a time into group 3 > but then group 4 grabs all the remaining string. > > I've tried changing the time pattern into > > (?:<td>(\d+\:\d+)</td>)+ > > but that doesn't seem to mean "grab one or more cases of the previous > regexp." > > Any Python regexp gurus with a hint would be greatly appreciated. > > M@ > This works:
import BeautifulSoup test = '<td>04/01/2006</td>' \ '<td>Wednesday</td>' \ '<td> </td>' \ '<td>09:14</td>' \ '<td>12:44</td>' \ '<td>12:50</td>' \ '<td>17:58</td>' \ '<td> </td>' \ '<td> </td>' \ '<td> </td>' \ '<td> </td>' \ '<td>08:14</td>' c=BeautifulSoup.BeautifulSoup(test) times=[] for i in c.childGenerator(): if i.contents[0] == " ": continue times.append(i.contents[0]) date=times.pop(0) day=times.pop(0) print "date=", date print "day=", day print "times=", times -Larry Bates -- http://mail.python.org/mailman/listinfo/python-list