I'm able to grab the problem webpage via Python just fine, albeit with
a bit of a delay. So, don't know what your exact problem is, maybe
your connection?
When you get the second page, are you getting the same content
back that you get if you do a search in your favorite browser?
Using just
content = urllib.urlopen(url2).read()
'Error' in content # True
'Friedrich' in content # False
However, when you browse to the page, those two should be inverted:
'Error' in content # False
'Friedrich' in content # True
I've tried adding in the parameters correctly via post
params = urllib.urlencode([
('params.forzaQuery', 'N'),
...
('layout', 'busquedaisbn'),
])
content = urllib.urlopen(url2, data).read()
However, this too fails because the underlying engine expects a
session ID in the URL. I finally got it to work with the code below:
import urllib
data = [
('params.forzaQuery', 'N'),
('params.cdispo', 'A'),
('params.cisbnExt', '8484031128'),
('params.liConceptosExt[0].texto', ''),
('params.orderByFormId', '1'),
('action', 'Buscar'),
('language', 'es'),
('prev_layout', 'busquedaisbn'),
('layout', 'busquedaisbn'),
]
params = urllib.urlencode(data)
url =
'http://www.mcu.es/webISBN/tituloSimpleDispatch.do;jsessionid=5E8D9A11E4A28BDF0BA6B254D0118262'
fp = urllib.urlopen(url, params)
content = fp.read()
fp.close()
but I had to hard-code the jsessionid parameter in the URL. This
would have to be determined from the initial call & response of
the initial URL (the initial URL returns a <FORM> element with
the URL to POST to, including this magic jsessionid parameter).
Hope this helps nudge you (the OP) in the right direction to get
what you're looking for.
-tkc
--
http://mail.python.org/mailman/listinfo/python-list