Re: Problem accessing a web page

Tim Chase Mon, 15 Dec 2008 12:56:40 -0800

I'm able to grab the problem webpage via Python just fine, albeit with
a bit of a delay. So, don't know what your exact problem is, maybe
your connection?

When you get the second page, are you getting the same contentback that you get if you do a search in your favorite browser?


Using just

  content = urllib.urlopen(url2).read()
  'Error' in content # True
  'Friedrich' in content # False

However, when you browse to the page, those two should be inverted:

  'Error' in content # False
  'Friedrich' in content # True

I've tried adding in the parameters correctly via post

  params = urllib.urlencode([
    ('params.forzaQuery', 'N'),
...
    ('layout', 'busquedaisbn'),
    ])
  content = urllib.urlopen(url2, data).read()

However, this too fails because the underlying engine expects asession ID in the URL. I finally got it to work with the code below:


  import urllib

  data = [
    ('params.forzaQuery', 'N'),
    ('params.cdispo', 'A'),
    ('params.cisbnExt', '8484031128'),
    ('params.liConceptosExt[0].texto', ''),
    ('params.orderByFormId', '1'),
    ('action', 'Buscar'),
    ('language', 'es'),
    ('prev_layout', 'busquedaisbn'),
    ('layout', 'busquedaisbn'),
    ]

  params = urllib.urlencode(data)

url ='http://www.mcu.es/webISBN/tituloSimpleDispatch.do;jsessionid=5E8D9A11E4A28BDF0BA6B254D0118262'


  fp = urllib.urlopen(url, params)
  content = fp.read()
  fp.close()

but I had to hard-code the jsessionid parameter in the URL. Thiswould have to be determined from the initial call & response ofthe initial URL (the initial URL returns a <FORM> element withthe URL to POST to, including this magic jsessionid parameter).

Hope this helps nudge you (the OP) in the right direction to getwhat you're looking for.


-tkc






--
http://mail.python.org/mailman/listinfo/python-list

Re: Problem accessing a web page

Reply via email to