Hello there,
Depending on the firmware version of the HP printer and the model type, one will encounter a myriad of combinations of the following strings while reading the index page:
hp HP color Color Printer Printer Status Status: Device: Device Status laserjet LaserJet
How can I go about determining if a site is indeed the Web interface to a HP printer? The goal is to remove all HP printers from a list of publicly available Web sites... I've tried this approach, but it gets messy quickly when I attempt to account for all possible combinations that HP uses:
f = urllib2.urlopen("http://%s" %host)
data = f.read()
f.close()
if 'hp' or 'HP' and 'color' or 'Color' and 'Printer' or 'Printer Status' in data:
DISREGARD THE IP
I'm sure there's a more graceful way to go about this while maintaining a high degree of accuracy and as few false positives as possible. Any tips or pointers?
Thanks in advance! -- http://mail.python.org/mailman/listinfo/python-list