7stud wrote: > schweet1 wrote: > > On Feb 19, 4:04�pm, 7stud <[EMAIL PROTECTED]> wrote: > > > schweet1 wrote: > > > > Greetings, > > > > > > > I am attempting to use python to submit a query to the following URL: > > > > > > >https://ramps.uspto.gov/eram/patentMaintFees.do > > > > > > > The page looks simple enough - it requires submitting a number into 2 > > > > form boxes and then selecting from the pull down. > > > > > > > However, my test scripts have been hung up, apparently due to the > > > > several buttons on the page having the same name. �Ideally, I would > > > > have the script use the "Get Bibligraphic Data" link. > > > > > > > Any assistance would be appreciated. > > > > > > > ~Jon > > > > > > This is the section you are interested in: > > > > > > ------------- > > > <tr> > > > <td colspan=3><input type="submit" name="maintFeeAction" > > > value="Retrieve Fees to Pay"> </td> > > > </tr> > > > > > > <tr> > > > <td colspan=3><input type="submit" name="maintFeeAction" value="Get > > > Bibliographic Data"> </td> > > > </tr> > > > > > > <tr> > > > <td colspan=3><input type="submit" name="maintFeeAction" value="View > > > Payment Windows"> </td> > > > </tr> > > > <tr> > > > ------------ > > > > > > 1) When you click on a submit button on a web page, a request is sent > > > out for the web page listed in the action attribute of the <form> tag, > > > which in this case is: > > > > > > <form name="mfInputForm" method="post" action="/eram/ > > > getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb"> > > > > > > The url specified in the action attribute is a relative url. �The > > > current url in the address bar of your browser window is: > > > > > > https://ramps.uspto.gov/eram/patentMaintFees.do > > > > > > and if you compare that to the url in the action attribute of the > > > <form> tag: > > > > > > ---------https://ramps.uspto.gov/eram/patentMaintFees.do > > > > > > /eram/getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX: > > > 11g0uepfb > > > --------- > > > > > > you can piece them together and get the absolute url: > > > > > > https://ramps.uspto.gov/eram/getMaintFeesInfo.do;jsessionid=0000-MCoY... > > > > > > 2) When you click on a submit button, a request is sent to that url. > > > The request will contain all the information you entered into the form > > > as name/value pairs. �The name is whatever is specified in the name > > > attribute of a tag and the value is whatever is entered into the form. > > > > > > Because the submit buttons in the form have name attributes, �the name > > > and value of the particular submit button that you click will be added > > > to the request. > > > > > > 3) �To programmatically mimic what happens in your browser when you > > > click on the submit button of a form, you need to send a request > > > directly to the url listed in the action attribute of the <form>. > > > Your request will contain the name/value pairs that would have been > > > sent to the server if you had actually filled out the form and clicked > > > on the 'Get Bibliographic Data' submit button. �The form contains > > > these input elements: > > > > > > ---- > > > <input type="text" name="patentNum" maxlength="7" size="7" value=""> > > > > > > <input type="text" name="applicationNum" maxlength="8" size="8" > > > value=""> > > > ---- > > > > > > and the submit button you want to click on is this one: > > > > > > <input type="submit" name="maintFeeAction" value="Get Bibliographic > > > Data"> > > > > > > So the name value pairs you need to include in your request are: > > > > > > data = { > > > � � 'patentNum':'1234567', > > > � � 'applicationNum':'08123456', > > > � � 'maintFeeAction':'Get Bibliographic Data' > > > > > > } > > > > > > Therefore, try something like this: > > > > > > import urllib > > > > > > data = { > > > � � 'patentNum':'1234567', > > > � � 'applicationNum':'08123456', > > > � � 'maintFeeAction':'Get Bibliographic Data' > > > > > > } > > > > > > enc_data = urllib.urlencode(data) > > > url = 'https://ramps.uspto.gov/eram/ > > > getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb' > > > > > > f = urllib.urlopen(url, enc_data) > > > > > > print f.read() > > > f.close() > > > > > > If that doesn't work, you may need to deal with cookies that the > > > server requires in order to keep track of you as you navigate from > > > page to page. �In that case, please post a valid patent number and > > > application number, so that I can do some further tests.- Hide quoted > > > text - > > > > > > - Show quoted text - > > > > Thanks all - I think there are cookie issues - here's an example data > > pair to play with: 6,725,879 (10/102,919). I'll post some of the code > > i've tried asap. > > > > Ok. Here is what your form looks like without all the <tr> and <td> > tags: > > ------------- > <form name="mfInputForm" method="post" action="/eram/ > getMaintFeesInfo.do;jsessionid=0000U8dQaywwUaYMMuwsl8h4WsX:11g0uehq7"> > > <input type="text" name="patentNum" maxlength="7" size="7" value=""> > <input type="text" name="applicationNum" maxlength="8" size="8" > value=""> > > <input type="hidden" name="signature" > value="52371786cafc8b58d140bb03ae5a1210"> > <input type="hidden" name="loadTime" value="1203546696130"> > <input type="hidden" name="sessionId" value="U8dQaywwUaYMMuwsl8h4WsX"> > > <input type="submit" name="maintFeeAction" value="Retrieve Fees to > Pay"> > <input type="submit" name="maintFeeAction" value="Get Bibliographic > Data"> > <input type="submit" name="maintFeeAction" value="View Payment > Windows"> > <input type="submit" name="maintFeeAction" value="View Statement"> > > for Payment Window: > <select name="maintFeeYear"><option value="04" selected="selected">04</ > option> > <option value="08">08</option> > <option value="12">12</option> > </select> > > </form> > ---------------- > > > First notice that there is a <select> tag at the bottom that contains > some information that would be included in the request if you filled > out the form by hand and clicked on the submit button. As a result, > the name/value pair of that <select> tag needs to be included in your > request. That requires that you add the following data to your > request: > > 'maintFeeYear':'04' #...or whatever you want the value to be > > > Also notice that there are 'hidden' form fields in the form. They > look like this: > > <input type='hidden' ....> > > A hidden form field is not visible on a web page, but just the same > its name/value pair gets sent to the server when the user submits the > form. As a result, you need to include the name/value pairs of the > hidden form fields in your request. It so happens that one of the > hidden form field's name is 'sessionId'. That id identifies you as > you move from page to page. If you click on a link or a button on a > page, a request is sent out for another page, and if the request does > not contain that sesssionID, then the request is rejected. > > What that means is: you cannot submit a request directly for the page > you want. First, you have to send out a request for the page with the > form on it and then extract some information from it. What you need > to do is: > > 1) Request the form page. > > 2) Extract the name/value pairs in the hidden form fields on the form > page. BeautifulSoup is good for doing things like that. You need to > add those name/value pairs to the dictionary containing the patent > number and the application number. > > 3) The url in the action attribute of the form looks like this: > > action="/eram/ > getMaintFeesInfo.do;jsessionid=0000U8dQaywwUaYMMuwsl8h4WsX:11g0uehq7 > > Note how there is a 'jsessionid' on the end. What that means is: the > url in the action attribute changes every time you go to the the form > page. As a consequence, you cannot know that url beforehand. Because > the information you want is at the url listed in the action attribute, > you have to extract that url from the form page as well. Once again, > BeautifulSoup makes that easy to do. > > > Once you have 1) all the data that is required, and 2) the proper url > to send your request to, then you can send out your request. Here is > an example: > > import urllib > import BeautifulSoup as bs > > #get the form page: > > response1 = urllib.urlopen('https://ramps.uspto.gov/eram/ > patentMaintFees.do') > > #extract the url from the action attribute: > > html_doc = bs.BeautifulSoup(response1.read()) > form = html_doc.find('form', attrs={'name':'mfInputForm'}) > action_attr_url = form['action'] > next_page_url = 'https://ramps.uspto.gov' + action_attr_url > > #create a dictionary for the data: > > form_data = { > 'patentNum':'6725879', > 'applicationNum':'10102919', > 'maintFeeYear': '04', #<select> name/value > 'maintFeeAction':'Get Bibliographic Data', #submit button name/ > value > } > > #extract the data contained in the hidden form fields > #hidden form fields look like this: <input type='hidden' ...> > > hidden_tags = form.findAll('input', type='hidden') > for tag in hidden_tags: > name = tag['name'] > value = tag['value'] > print name, value #if you want to see what's going on > > form_data[name] = value #add the data to our dictionary > > #format the data and send out the request: > > enc_data = urllib.urlencode(form_data) > response2 = urllib.urlopen(next_page_url, enc_data) > > print response2.read() > response2.close()
Throw in a response1.close() here: > response1 = urllib.urlopen('https://ramps.uspto.gov/eram/ > patentMaintFees.do') > > #extract the url from the action attribute: > > html_doc = bs.BeautifulSoup(response1.read()) > response1.close() -- http://mail.python.org/mailman/listinfo/python-list