Querying a complex website
Greetings, I am attempting to use python to submit a query to the following URL: https://ramps.uspto.gov/eram/patentMaintFees.do The page looks simple enough - it requires submitting a number into 2 form boxes and then selecting from the pull down. However, my test scripts have been hung up, apparently due to the several buttons on the page having the same name. Ideally, I would have the script use the "Get Bibligraphic Data" link. Any assistance would be appreciated. ~Jon -- http://mail.python.org/mailman/listinfo/python-list
Re: Querying a complex website
On Feb 19, 4:04 pm, 7stud <[EMAIL PROTECTED]> wrote: > schweet1 wrote: > > Greetings, > > > I am attempting to use python to submit a query to the following URL: > > >https://ramps.uspto.gov/eram/patentMaintFees.do > > > The page looks simple enough - it requires submitting a number into 2 > > form boxes and then selecting from the pull down. > > > However, my test scripts have been hung up, apparently due to the > > several buttons on the page having the same name. Ideally, I would > > have the script use the "Get Bibligraphic Data" link. > > > Any assistance would be appreciated. > > > ~Jon > > This is the section you are interested in: > > - > > value="Retrieve Fees to Pay"> > > > > > > > > > > > > > 1) When you click on a submit button on a web page, a request is sent > out for the web page listed in the action attribute of the tag, > which in this case is: > > > > The url specified in the action attribute is a relative url. The > current url in the address bar of your browser window is: > > https://ramps.uspto.gov/eram/patentMaintFees.do > > and if you compare that to the url in the action attribute of the > tag: > > -https://ramps.uspto.gov/eram/patentMaintFees.do > > /eram/getMaintFeesInfo.do;jsessionid=-MCoYNbJsaUCr2VfzZhKILX: > 11g0uepfb > - > > you can piece them together and get the absolute url: > > https://ramps.uspto.gov/eram/getMaintFeesInfo.do;jsessionid=-MCoY... > > 2) When you click on a submit button, a request is sent to that url. > The request will contain all the information you entered into the form > as name/value pairs. The name is whatever is specified in the name > attribute of a tag and the value is whatever is entered into the form. > > Because the submit buttons in the form have name attributes, the name > and value of the particular submit button that you click will be added > to the request. > > 3) To programmatically mimic what happens in your browser when you > click on the submit button of a form, you need to send a request > directly to the url listed in the action attribute of the . > Your request will contain the name/value pairs that would have been > sent to the server if you had actually filled out the form and clicked > on the 'Get Bibliographic Data' submit button. The form contains > these input elements: > > > > > value=""> > > > and the submit button you want to click on is this one: > > > > So the name value pairs you need to include in your request are: > > data = { > 'patentNum':'1234567', > 'applicationNum':'08123456', > 'maintFeeAction':'Get Bibliographic Data' > > } > > Therefore, try something like this: > > import urllib > > data = { > 'patentNum':'1234567', > 'applicationNum':'08123456', > 'maintFeeAction':'Get Bibliographic Data' > > } > > enc_data = urllib.urlencode(data) > url = 'https://ramps.uspto.gov/eram/ > getMaintFeesInfo.do;jsessionid=-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb' > > f = urllib.urlopen(url, enc_data) > > print f.read() > f.close() > > If that doesn't work, you may need to deal with cookies that the > server requires in order to keep track of you as you navigate from > page to page. In that case, please post a valid patent number and > application number, so that I can do some further tests.- Hide quoted text - > > - Show quoted text - Thanks all - I think there are cookie issues - here's an example data pair to play with: 6,725,879 (10/102,919). I'll post some of the code i've tried asap. -- http://mail.python.org/mailman/listinfo/python-list
Re: Querying a complex website
On Feb 20, 6:06 pm, 7stud <[EMAIL PROTECTED]> wrote: > 7stud wrote: > > schweet1 wrote: > > > On Feb 19, 4:04�pm, 7stud <[EMAIL PROTECTED]> wrote: > > > > schweet1 wrote: > > > > > Greetings, > > > > > > I am attempting to use python to submit a query to the following URL: > > > > > >https://ramps.uspto.gov/eram/patentMaintFees.do > > > > > > The page looks simple enough - it requires submitting a number into 2 > > > > > form boxes and then selecting from the pull down. > > > > > > However, my test scripts have been hung up, apparently due to the > > > > > several buttons on the page having the same name. �Ideally, I would > > > > > have the script use the "Get Bibligraphic Data" link. > > > > > > Any assistance would be appreciated. > > > > > > ~Jon > > > > > This is the section you are interested in: > > > > > - > > > > > > > > > > > value="Retrieve Fees to Pay"> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1) When you click on a submit button on a web page, a request is sent > > > > out for the web page listed in the action attribute of the tag, > > > > which in this case is: > > > > > > > > > > The url specified in the action attribute is a relative url. �The > > > > current url in the address bar of your browser window is: > > > > >https://ramps.uspto.gov/eram/patentMaintFees.do > > > > > and if you compare that to the url in the action attribute of the > > > > tag: > > > > > -https://ramps.uspto.gov/eram/patentMaintFees.do > > > > > /eram/getMaintFeesInfo.do;jsessionid=-MCoYNbJsaUCr2VfzZhKILX: > > > > 11g0uepfb > > > > - > > > > > you can piece them together and get the absolute url: > > > > >https://ramps.uspto.gov/eram/getMaintFeesInfo.do;jsessionid=-MCoY... > > > > > 2) When you click on a submit button, a request is sent to that url. > > > > The request will contain all the information you entered into the form > > > > as name/value pairs. �The name is whatever is specified in the name > > > > attribute of a tag and the value is whatever is entered into the form. > > > > > Because the submit buttons in the form have name attributes, �the name > > > > and value of the particular submit button that you click will be added > > > > to the request. > > > > > 3) �To programmatically mimic what happens in your browser when you > > > > click on the submit button of a form, you need to send a request > > > > directly to the url listed in the action attribute of the . > > > > Your request will contain the name/value pairs that would have been > > > > sent to the server if you had actually filled out the form and clicked > > > > on the 'Get Bibliographic Data' submit button. �The form contains > > > > these input elements: > > > > > > > > > > > > > > > > > value=""> > > > > > > > > > and the submit button you want to click on is this one: > > > > > > > > > > So the name value pairs you need to include in your request are: > > > > > data = { > > > > � � 'patentNum':'1234567', > > > > � � 'applicationNum':'08123456', > > > > � � 'maintFeeAction':'Get Bibliographic Data' > > > > > } > > > > > Therefore, try something like this: > > > > > import urllib > > > > > data = { > > > > � � 'patentNum':'1234567', > > > > � � 'applicationNum':'08123456', > > > > � � 'maintFeeAction':'Get Bibliographic Data' > > > > > } > > > > > enc_data = urllib.urlencode(data) > > > > url = 'https://ramps.uspto.gov/eram/ > > > > getMaintFeesInfo.do;jsessionid=-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb' > > > > > f = urllib.urlopen(url, enc_data) > > > > > print f.read() > > > > f.close() > > > > > If that doesn't work, you may
Saving tif file from tricky webserver
Greetings, I am attempting to automate accessing and saving a file (a TIF) from the following URL: http://patimg1.uspto.gov/.DImg?Docid=US007376435&PageNum=1&IDKey=E21184B8FAD5 I have tried some methods using urllib, httplib, and web32com.client(InternetExplorer), but haven't been successful. Currently I am using (in Python 2.5) import webbrowser url = [see above] webbrowser.open(url, new=0, autoraise=0) When this is run a windows popup dialog opens asking me to Open, Save, or Cancel. However, if I query multiple such URLs, I do not want to have to respond manually. Is there a way I can use Python to save the TIF? -- http://mail.python.org/mailman/listinfo/python-list