On 04/03/2014 16:26, Alan Gauld wrote:
My turn to ask a question.
This has me pulling my hair out. Hopefully it's something obvious...

I'm trying to pull some dates out of an HTML web page generated
from an Excel spreadsheet.

I've simplified things somewhat so the file(sample.htm) looks like:

<html>
<body link=blue vlink=purple>

<table border=0 cellpadding=0 cellspacing=0 width=752
style='border-collapse:
  collapse;table-layout:fixed;width:564pt'>
  <tr class=xl66 height=21 style='height:15.75pt'>
   <td height=21 class=xl66 width=64
style='height:15.75pt;width:48pt'>ItemID</td>
   <td class=xl66 width=115 style='width:86pt'>Name</td>
   <td class=xl66 width=99 style='width:74pt'>DateLent</td>
   <td class=xl66 width=121 style='width:91pt'>DateReturned</td>
  </tr>
  <tr height=20 style='height:15.0pt'>
   <td height=20 align=right style='height:15.0pt'>1</td>
   <td>LawnMower</td>
   <td>Small Hover mower</td>
   <td>Fred</td>
   <td>Joe</td>
   <td class=xl65 align=right>4/1/2012</td>
   <td class=xl65 align=right>4/26/2012</td>
  </tr>
</table>
</body>
</html>

The code looks like:

import html.parser

class SampleParser(html.parser.HTMLParser):
     def __init__(self):
         super().__init__()
         self.isDate = False

     def handle_starttag(self, name, attributes):
         if name == 'td':
             for key, value in attributes:
                 if key == 'class':
                    print ('Class Value: ',repr(value))
                    if value.endswith('165'):
                       print ('We got a date')
                       self.isDate = True
                    break

     def handle_endtag(self,name):
         self.isDate = False

     def handle_data(self, data):
         if self.isDate:
             print('Date: ', data)

if __name__ == '__main__':
     print('start test')
     htm = open('sample.htm').read()
     parser = SampleParser()
     parser.feed(htm)
     print('end test')

And the output looks like:

start test
Class Value:  'xl66'
Class Value:  'xl66'
Class Value:  'xl66'
Class Value:  'xl66'
Class Value:  'xl65'
Class Value:  'xl65'
end test

As you can see I'm picking up the class attribute and
its value but the conditional test for x165 is failing.

I've tried

if value == 'x165'
if 'x165' in value

and every other test I can think of.

Why am I not seeing the "We got a date" message?

PS.
Please don't suggest other modules/packages etc,
I'm using html.parser for a reason.

Frustratedly,

Steven has pointed out the symptoms. Cause, should have gone to Specsavers. :)

--
My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to