On Feb 20, 6:06 pm, 7stud <[EMAIL PROTECTED]> wrote: > 7stud wrote: > > schweet1 wrote: > > > On Feb 19, 4:04�pm, 7stud <[EMAIL PROTECTED]> wrote: > > > > schweet1 wrote: > > > > > Greetings, > > > > > > I am attempting to use python to submit a query to the following URL: > > > > > >https://ramps.uspto.gov/eram/patentMaintFees.do > > > > > > The page looks simple enough - it requires submitting a number into 2 > > > > > form boxes and then selecting from the pull down. > > > > > > However, my test scripts have been hung up, apparently due to the > > > > > several buttons on the page having the same name. �Ideally, I would > > > > > have the script use the "Get Bibligraphic Data" link. > > > > > > Any assistance would be appreciated. > > > > > > ~Jon > > > > > This is the section you are interested in: > > > > > ------------- > > > > <tr> > > > > <td colspan=3><input type="submit" name="maintFeeAction" > > > > value="Retrieve Fees to Pay"> </td> > > > > </tr> > > > > > <tr> > > > > <td colspan=3><input type="submit" name="maintFeeAction" value="Get > > > > Bibliographic Data"> </td> > > > > </tr> > > > > > <tr> > > > > <td colspan=3><input type="submit" name="maintFeeAction" value="View > > > > Payment Windows"> </td> > > > > </tr> > > > > <tr> > > > > ------------ > > > > > 1) When you click on a submit button on a web page, a request is sent > > > > out for the web page listed in the action attribute of the <form> tag, > > > > which in this case is: > > > > > <form name="mfInputForm" method="post" action="/eram/ > > > > getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb"> > > > > > The url specified in the action attribute is a relative url. �The > > > > current url in the address bar of your browser window is: > > > > >https://ramps.uspto.gov/eram/patentMaintFees.do > > > > > and if you compare that to the url in the action attribute of the > > > > <form> tag: > > > > > ---------https://ramps.uspto.gov/eram/patentMaintFees.do > > > > > /eram/getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX: > > > > 11g0uepfb > > > > --------- > > > > > you can piece them together and get the absolute url: > > > > >https://ramps.uspto.gov/eram/getMaintFeesInfo.do;jsessionid=0000-MCoY... > > > > > 2) When you click on a submit button, a request is sent to that url. > > > > The request will contain all the information you entered into the form > > > > as name/value pairs. �The name is whatever is specified in the name > > > > attribute of a tag and the value is whatever is entered into the form. > > > > > Because the submit buttons in the form have name attributes, �the name > > > > and value of the particular submit button that you click will be added > > > > to the request. > > > > > 3) �To programmatically mimic what happens in your browser when you > > > > click on the submit button of a form, you need to send a request > > > > directly to the url listed in the action attribute of the <form>. > > > > Your request will contain the name/value pairs that would have been > > > > sent to the server if you had actually filled out the form and clicked > > > > on the 'Get Bibliographic Data' submit button. �The form contains > > > > these input elements: > > > > > ---- > > > > <input type="text" name="patentNum" maxlength="7" size="7" value=""> > > > > > <input type="text" name="applicationNum" maxlength="8" size="8" > > > > value=""> > > > > ---- > > > > > and the submit button you want to click on is this one: > > > > > <input type="submit" name="maintFeeAction" value="Get Bibliographic > > > > Data"> > > > > > So the name value pairs you need to include in your request are: > > > > > data = { > > > > � � 'patentNum':'1234567', > > > > � � 'applicationNum':'08123456', > > > > � � 'maintFeeAction':'Get Bibliographic Data' > > > > > } > > > > > Therefore, try something like this: > > > > > import urllib > > > > > data = { > > > > � � 'patentNum':'1234567', > > > > � � 'applicationNum':'08123456', > > > > � � 'maintFeeAction':'Get Bibliographic Data' > > > > > } > > > > > enc_data = urllib.urlencode(data) > > > > url = 'https://ramps.uspto.gov/eram/ > > > > getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb' > > > > > f = urllib.urlopen(url, enc_data) > > > > > print f.read() > > > > f.close() > > > > > If that doesn't work, you may need to deal with cookies that the > > > > server requires in order to keep track of you as you navigate from > > > > page to page. �In that case, please post a valid patent number and > > > > application number, so that I can do some further tests.- Hide quoted > > > > text - > > > > > - Show quoted text - > > > > Thanks all - I think there are cookie issues - here's an example data > > > pair to play with: 6,725,879 (10/102,919). I'll post some of the code > > > i've tried asap. > > > Ok. Here is what your form looks like without all the <tr> and <td> > > tags: > > > ------------- > > <form name="mfInputForm" method="post" action="/eram/ > > getMaintFeesInfo.do;jsessionid=0000U8dQaywwUaYMMuwsl8h4WsX:11g0uehq7"> > > > <input type="text" name="patentNum" maxlength="7" size="7" value=""> > > <input type="text" name="applicationNum" maxlength="8" size="8" > > value=""> > > > <input type="hidden" name="signature" > > value="52371786cafc8b58d140bb03ae5a1210"> > > <input type="hidden" name="loadTime" value="1203546696130"> > > <input type="hidden" name="sessionId" value="U8dQaywwUaYMMuwsl8h4WsX"> > > > <input type="submit" name="maintFeeAction" value="Retrieve Fees to > > Pay"> > > <input type="submit" name="maintFeeAction" value="Get Bibliographic > > Data"> > > <input type="submit" name="maintFeeAction" value="View Payment > > Windows"> > > <input type="submit" name="maintFeeAction" value="View Statement"> > > > for Payment Window: > > <select name="maintFeeYear"><option value="04" selected="selected">04</ > > option> > > <option value="08">08</option> > > <option value="12">12</option> > > </select> > > > </form> > > ---------------- > > > First notice that there is a <select> tag at the bottom that contains > > some information that would be included in the request if you filled > > out the form by hand and clicked on the submit button. As a result, > > the name/value pair of that <select> tag needs to be included in your > > request. That requires that you add the following data to your > > request: > > > 'maintFeeYear':'04' #...or whatever you want the value to be > > > Also notice that there are 'hidden' form fields in the form. They > > look like this: > > > <input type='hidden' ....> > > > A hidden form field is not visible on a web page, but just the same > > its name/value pair gets sent to the server when the user submits the > > form. As a result, you need to include the name/value pairs of the > > hidden form fields in your request. It so happens that one of the > > hidden form field's name is 'sessionId'. That id identifies you as > > you move from page to page. If you click on a link or a button on a > > page, a request is sent out for another page, and if the request does > > not contain that sesssionID, then the request is rejected. > > > What that means is: you cannot submit a request directly for the page > > you want. First, you have to send out a request for the page with the > > form on it and then extract some information from it. What you need > > to do is: > > > 1) Request the form page. > > > 2) Extract the name/value pairs in the hidden form fields on the form > > page. BeautifulSoup is good for doing things like that. You need to > > add those name/value pairs to the dictionary containing the patent > > number and the application number. > > > 3) The url in the action attribute of the form looks like this: > > > action="/eram/ > > getMaintFeesInfo.do;jsessionid=0000U8dQaywwUaYMMuwsl8h4WsX:11g0uehq7 > > > Note how there is a 'jsessionid' on the end. What that means is: the > > url in the action attribute changes every time you go to the the form > > page. As a consequence, you cannot know that url beforehand. Because > > the information you want is at the url listed in the action attribute, > > you have to extract that url from the form page as well. Once again, > > BeautifulSoup makes that easy to do. > > > Once you have 1) all the data that is required, and 2) the proper url > > to send your request to, then you can send out your request. Here is > > an example: > > > import urllib > > import BeautifulSoup as bs > > > #get the form page: > > > response1 = urllib.urlopen('https://ramps.uspto.gov/eram/ > > patentMaintFees.do') > > > #extract the url from the action attribute: > > > html_doc = bs.BeautifulSoup(response1.read()) > > form = html_doc.find('form', attrs={'name':'mfInputForm'}) > > action_attr_url = form['action'] > > next_page_url = 'https://ramps.uspto.gov'+ action_attr_url > > > #create a dictionary for the data: > > > form_data = { > > 'patentNum':'6725879', > > 'applicationNum':'10102919', > > 'maintFeeYear': '04', #<select> name/value > > 'maintFeeAction':'Get Bibliographic Data', #submit button name/ > > value > > } > > > #extract the data contained in the hidden form fields > > #hidden form fields look like this: <input type='hidden' ...> > > > hidden_tags = form.findAll('input', type='hidden') > > for tag in hidden_tags: > > name = tag['name'] > > value = tag['value'] > > print name, value #if you want to see what's going on > > > form_data[name] = value #add the data to our dictionary > > > #format the data and send out the request: > > > enc_data = urllib.urlencode(form_data) > > response2 = urllib.urlopen(next_page_url, enc_data) > > > print response2.read() > > response2.close() > > Throw in a response1.close() here: > > > response1 = urllib.urlopen('https://ramps.uspto.gov/eram/ > > patentMaintFees.do') > > > #extract the url from the action attribute: > > > html_doc = bs.BeautifulSoup(response1.read()) > > response1.close()
Thanks a million. This worked for me. -- http://mail.python.org/mailman/listinfo/python-list