Querying a complex website

2008-02-19 Thread schweet1
Greetings,

I am attempting to use python to submit a query to the following URL:

https://ramps.uspto.gov/eram/patentMaintFees.do

The page looks simple enough - it requires submitting a number into 2
form boxes and then selecting from the pull down.

However, my test scripts have been hung up, apparently due to the
several buttons on the page having the same name.  Ideally, I would
have the script use the "Get Bibligraphic Data" link.

Any assistance would be appreciated.

~Jon
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Querying a complex website

2008-02-20 Thread schweet1
On Feb 19, 4:04 pm, 7stud <[EMAIL PROTECTED]> wrote:
> schweet1 wrote:
> > Greetings,
>
> > I am attempting to use python to submit a query to the following URL:
>
> >https://ramps.uspto.gov/eram/patentMaintFees.do
>
> > The page looks simple enough - it requires submitting a number into 2
> > form boxes and then selecting from the pull down.
>
> > However, my test scripts have been hung up, apparently due to the
> > several buttons on the page having the same name.  Ideally, I would
> > have the script use the "Get Bibligraphic Data" link.
>
> > Any assistance would be appreciated.
>
> > ~Jon
>
> This is the section you are interested in:
>
> -
> 
>  value="Retrieve Fees to Pay"> 
> 
>
> 
>  
> 
>
> 
>  
> 
> 
> 
>
> 1) When you click on a submit button on a web page, a request is sent
> out for the web page listed in the action attribute of the  tag,
> which in this case is:
>
> 
>
> The url specified in the action attribute is a relative url.  The
> current url in the address bar of your browser window is:
>
> https://ramps.uspto.gov/eram/patentMaintFees.do
>
> and if you compare that to the url in the action attribute of the
>  tag:
>
> -https://ramps.uspto.gov/eram/patentMaintFees.do
>
> /eram/getMaintFeesInfo.do;jsessionid=-MCoYNbJsaUCr2VfzZhKILX:
> 11g0uepfb
> -
>
> you can piece them together and get the absolute url:
>
> https://ramps.uspto.gov/eram/getMaintFeesInfo.do;jsessionid=-MCoY...
>
> 2) When you click on a submit button, a request is sent to that url.
> The request will contain all the information you entered into the form
> as name/value pairs.  The name is whatever is specified in the name
> attribute of a tag and the value is whatever is entered into the form.
>
> Because the submit buttons in the form have name attributes,  the name
> and value of the particular submit button that you click will be added
> to the request.
>
> 3)  To programmatically mimic what happens in your browser when you
> click on the submit button of a form, you need to send a request
> directly to the url listed in the action attribute of the .
> Your request will contain the name/value pairs that would have been
> sent to the server if you had actually filled out the form and clicked
> on the 'Get Bibliographic Data' submit button.  The form contains
> these input elements:
>
> 
> 
>
>  value="">
> 
>
> and the submit button you want to click on is this one:
>
> 
>
> So the name value pairs you need to include in your request are:
>
> data = {
>     'patentNum':'1234567',
>     'applicationNum':'08123456',
>     'maintFeeAction':'Get Bibliographic Data'
>
> }
>
> Therefore, try something like this:
>
> import urllib
>
> data = {
>     'patentNum':'1234567',
>     'applicationNum':'08123456',
>     'maintFeeAction':'Get Bibliographic Data'
>
> }
>
> enc_data = urllib.urlencode(data)
> url = 'https://ramps.uspto.gov/eram/
> getMaintFeesInfo.do;jsessionid=-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb'
>
> f = urllib.urlopen(url, enc_data)
>
> print f.read()
> f.close()
>
> If that doesn't work, you may need to deal with cookies that the
> server requires in order to keep track of you as you navigate from
> page to page.  In that case, please post a valid patent number and
> application number, so that I can do some further tests.- Hide quoted text -
>
> - Show quoted text -

Thanks all - I think there are cookie issues - here's an example data
pair to play with: 6,725,879 (10/102,919).  I'll post some of the code
i've tried asap.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Querying a complex website

2008-02-22 Thread schweet1
On Feb 20, 6:06 pm, 7stud <[EMAIL PROTECTED]> wrote:
> 7stud wrote:
> > schweet1 wrote:
> > > On Feb 19, 4:04�pm, 7stud <[EMAIL PROTECTED]> wrote:
> > > > schweet1 wrote:
> > > > > Greetings,
>
> > > > > I am attempting to use python to submit a query to the following URL:
>
> > > > >https://ramps.uspto.gov/eram/patentMaintFees.do
>
> > > > > The page looks simple enough - it requires submitting a number into 2
> > > > > form boxes and then selecting from the pull down.
>
> > > > > However, my test scripts have been hung up, apparently due to the
> > > > > several buttons on the page having the same name. �Ideally, I would
> > > > > have the script use the "Get Bibligraphic Data" link.
>
> > > > > Any assistance would be appreciated.
>
> > > > > ~Jon
>
> > > > This is the section you are interested in:
>
> > > > -
> > > > 
> > > >  > > > value="Retrieve Fees to Pay"> 
> > > > 
>
> > > > 
> > > >  
> > > > 
>
> > > > 
> > > >  
> > > > 
> > > > 
> > > > 
>
> > > > 1) When you click on a submit button on a web page, a request is sent
> > > > out for the web page listed in the action attribute of the  tag,
> > > > which in this case is:
>
> > > > 
>
> > > > The url specified in the action attribute is a relative url. �The
> > > > current url in the address bar of your browser window is:
>
> > > >https://ramps.uspto.gov/eram/patentMaintFees.do
>
> > > > and if you compare that to the url in the action attribute of the
> > > >  tag:
>
> > > > -https://ramps.uspto.gov/eram/patentMaintFees.do
>
> > > > /eram/getMaintFeesInfo.do;jsessionid=-MCoYNbJsaUCr2VfzZhKILX:
> > > > 11g0uepfb
> > > > -
>
> > > > you can piece them together and get the absolute url:
>
> > > >https://ramps.uspto.gov/eram/getMaintFeesInfo.do;jsessionid=-MCoY...
>
> > > > 2) When you click on a submit button, a request is sent to that url.
> > > > The request will contain all the information you entered into the form
> > > > as name/value pairs. �The name is whatever is specified in the name
> > > > attribute of a tag and the value is whatever is entered into the form.
>
> > > > Because the submit buttons in the form have name attributes, �the name
> > > > and value of the particular submit button that you click will be added
> > > > to the request.
>
> > > > 3) �To programmatically mimic what happens in your browser when you
> > > > click on the submit button of a form, you need to send a request
> > > > directly to the url listed in the action attribute of the .
> > > > Your request will contain the name/value pairs that would have been
> > > > sent to the server if you had actually filled out the form and clicked
> > > > on the 'Get Bibliographic Data' submit button. �The form contains
> > > > these input elements:
>
> > > > 
> > > > 
>
> > > >  > > > value="">
> > > > 
>
> > > > and the submit button you want to click on is this one:
>
> > > > 
>
> > > > So the name value pairs you need to include in your request are:
>
> > > > data = {
> > > > � � 'patentNum':'1234567',
> > > > � � 'applicationNum':'08123456',
> > > > � � 'maintFeeAction':'Get Bibliographic Data'
>
> > > > }
>
> > > > Therefore, try something like this:
>
> > > > import urllib
>
> > > > data = {
> > > > � � 'patentNum':'1234567',
> > > > � � 'applicationNum':'08123456',
> > > > � � 'maintFeeAction':'Get Bibliographic Data'
>
> > > > }
>
> > > > enc_data = urllib.urlencode(data)
> > > > url = 'https://ramps.uspto.gov/eram/
> > > > getMaintFeesInfo.do;jsessionid=-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb'
>
> > > > f = urllib.urlopen(url, enc_data)
>
> > > > print f.read()
> > > > f.close()
>
> > > > If that doesn't work, you may

Saving tif file from tricky webserver

2008-05-29 Thread schweet1
Greetings,

I am attempting to automate accessing and saving a file (a TIF) from
the following URL:

http://patimg1.uspto.gov/.DImg?Docid=US007376435&PageNum=1&IDKey=E21184B8FAD5

I have tried some methods using urllib, httplib, and
web32com.client(InternetExplorer), but haven't been successful.
Currently I am using (in Python 2.5)

import webbrowser

url = [see above]

webbrowser.open(url, new=0, autoraise=0)

When this is run a windows popup dialog opens asking me to Open, Save,
or Cancel.  However, if I query multiple such URLs, I do not want to
have to respond manually.  Is there a way I can use Python to save the
TIF?
--
http://mail.python.org/mailman/listinfo/python-list