samuels <[EMAIL PROTECTED]> wrote: > Hello All, > > I am a total python newbie, and I need help writing a script. > > This is what I want to do: > > There is a list of links at http://www.rentalhq.com/fulllist.asp. Each > link goes to a page like, > http://www.rentalhq.com/store.asp?id=907%2F272%2D4425, that contains a > company name, address, phone, and fax. I want extract each page, parse > this information, and export it to a comma delimited text file, or tab > delimited. The important information in each page is: > > <table border="0" cellpadding="0" cellspacing="0" > style="border-collapse: collapse" bordercolor="#111111" width="100%" > id="AutoNumber1"> > <tr> > <td width="100%" colspan="2"> > <h2 style="text-align: center; margin-top:2; margin-bottom:2; > line-height:14px" class="title"> > <font size="4">United Rentals Inc.</font> > </h2> > > <h3 style="text-align: center; margin-top:4; > margin-bottom:4">3401 Commercial Dr. > Anchorage AK, 99501-3024 > </h3> > <p style="text-align: center; margin-top:4; margin-bottom:4"> > <a target="_blank" > href="http://maps.google.com/maps?q=3401+Commercial+Dr%2E Anchorage AK > 99501-3024 "> > <!-- <a target="_blank" > href="http://www.mapquest.com/maps/map.adp?city=Anchorage&state=AK&address=3401+Commercial+Dr.&zip=99501-3024&country=&zoom=8">--> > <img height="15" src="Scraps/Rental_Images/map.gif" width="33" > border="0"></a> > </p> > </td> > </tr> > <tr> > <td width="50%" valign="top"> > <p style="text-align: center; line-height:100%; margin-top:0; > margin-bottom:0"> > </p> > <p style="text-align: center; line-height: 100%; margin-top:0; > margin-bottom:0"> > <b>Phone</b> - 907/272-4425<br> > <b>Fax</b> - 907/272-9683 </p> > > So from that I want output like : > > United Rentals Inc.,3401 Commercial > Dr.,Anchorage,AK,"995013024","9072724425","9072729683" > > or > > United Rentals Inc. 3401 Commercial > Dr. Anchorage AK 995013024 9072724425 9072729683 > > > I have been messing around with beautiful soup > (http://www.crummy.com/software/BeautifulSoup/index.html) but haven't > gotten very far. (specially because the html is so sloppy) > > Any help would be really appreciated! Just point me in the right > direction, what to use, examples... Thanks!
I'm sure others will give proper Python solution. But, here, shell is not a bad tool. lynx -dump 'http://www.rentalhq.com/store.asp?id=907%2F272%2D4425' | \ awk '/Return to List of Rental Stores/,/To reserve an item/' | \ sed -n -e '3p;5p;10p;11p' gives me United Rentals Inc. 3401 Commercial Dr. Anchorage AK, 99501-3024 Phone - 907/272-4425 Fax - 907/272-9683 -- William Park <[EMAIL PROTECTED]>, Toronto, Canada ThinFlash: Linux thin-client on USB key (flash) drive http://home.eol.ca/~parkw/thinflash.html BashDiff: Super Bash shell http://freshmeat.net/projects/bashdiff/ -- http://mail.python.org/mailman/listinfo/python-list