I'm able to grab the problem webpage via Python just fine, albeit with
a bit of a delay. So, don't know what your exact problem is, maybe
your connection?

When you get the second page, are you getting the same content back that you get if you do a search in your favorite browser?

Using just

  content = urllib.urlopen(url2).read()
  'Error' in content # True
  'Friedrich' in content # False

However, when you browse to the page, those two should be inverted:

  'Error' in content # False
  'Friedrich' in content # True

I've tried adding in the parameters correctly via post

  params = urllib.urlencode([
    ('params.forzaQuery', 'N'),
...
    ('layout', 'busquedaisbn'),
    ])
  content = urllib.urlopen(url2, data).read()

However, this too fails because the underlying engine expects a session ID in the URL. I finally got it to work with the code below:

  import urllib

  data = [
    ('params.forzaQuery', 'N'),
    ('params.cdispo', 'A'),
    ('params.cisbnExt', '8484031128'),
    ('params.liConceptosExt[0].texto', ''),
    ('params.orderByFormId', '1'),
    ('action', 'Buscar'),
    ('language', 'es'),
    ('prev_layout', 'busquedaisbn'),
    ('layout', 'busquedaisbn'),
    ]

  params = urllib.urlencode(data)

url = 'http://www.mcu.es/webISBN/tituloSimpleDispatch.do;jsessionid=5E8D9A11E4A28BDF0BA6B254D0118262'

  fp = urllib.urlopen(url, params)
  content = fp.read()
  fp.close()


but I had to hard-code the jsessionid parameter in the URL. This would have to be determined from the initial call & response of the initial URL (the initial URL returns a <FORM> element with the URL to POST to, including this magic jsessionid parameter).

Hope this helps nudge you (the OP) in the right direction to get what you're looking for.

-tkc






--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to