On May 12, 1:54 pm, Jetus <[EMAIL PROTECTED]> wrote: > I am able to download this page (enclosed code), but I then want to > download a pdf file that I can view in a regular browser by clicking > on the "view" link. I don't know how to automate this next part of my > script. It seems like it uses Javascript. > The line in the page source says > > href="javascript:openimagewin('JCCOGetImage.jsp? > refnum=DN2007036179');" tabindex=-1> >
1) Use BeautifulSoup to extract the path: JCCOGetImage.jsp?refnum=DN2007036179 from the html page. 2) The path is relative to the current url, so if the current url is: http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp Then the url to the page you want is: http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2007036179 You can use urlparse.urljoin() to join a relative path to the current url: import urlparse base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp' relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179' target_url = urlparse.urljoin(base_url, relative_url) print target_url --output:-- http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2007036179 3) Python has a webbrowser module that allows you to open urls in a browser: import webbrowser webbrowser.open("www.google.com") You could also use system() or os.startfile()[Windows], to do the same thing: os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe') #You don't have to worry about directory names #with spaces in them if you use startfile(): os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe') All the urls you posted give me errors when I try to open them in a browser, so you will have to sort out those problems first. -- http://mail.python.org/mailman/listinfo/python-list