Problem accessing a web page

2008-12-15 Thread Antoni Mont
Hi all,

My apologises if this is not the appropriate group.

I'd like to access a web site from a python script. That page, in fact,
is a form of main page. With a browser (Firefox, for instance) I can do
it without problem: I open the main web whose url is:

'http://www.mcu.es/webISBN/tituloSimpleFilter.do?cache=init&prev_layout=busquedaisbn&layout=busquedaisbn&language=es'

and then, from the same (or another tab) I open the form (it's a book
database  and the ISBN is the relevant parameter) whose url is: 

'http://www.mcu.es/webISBN/tituloSimpleDispatch.do?params.forzaQuery=N¶ms.cisbnExt=8484031128&action=Buscar&layout=busquedaisbn'

So I get the information about the book.

But when I try to do the same from the script, I get a time-out
error -without time elapsing at all. This is the piece of the
script relevant for the subject:

#!/usr/bin/python
# coding: latin-1
import os, sys
import types
import time
import string
import fileinput
import re
import urllib

mcup =
urllib.urlopen('http://www.mcu.es/webISBN/tituloSimpleFilter.do?cache=init&prev_layout=busquedaisbn&layout=busquedaisbn&language=es')
# open main url
jonk = mcup.read()   # read no matter
mcui =
urllib.urlopen('http://www.mcu.es/webISBN/tituloSimpleDispatch.do?params.forzaQuery=N¶ms.cisbnExt=8484031128&action=Buscar&layout=busquedaisbn')
# open form for isbn
pagllibre = mcui.read() # reads it
print pagllibre # and print it
mcui.close()# close form
mcup.close()    # close main page

Thanks in advance, I'd appreciate any help.

Regards,
Antoni Mont
  

--
http://mail.python.org/mailman/listinfo/python-list


Re: Problem accessing a web page

2008-12-15 Thread Antoni Mont
Tim Chase wrote:

> When you get the second page, are you getting the same content
> back that you get if you do a search in your favorite browser?
> 
> Using just
> 
>content = urllib.urlopen(url2).read()
>'Error' in content # True
>'Friedrich' in content # False
> 
> However, when you browse to the page, those two should be inverted:
> 
>'Error' in content # False
>'Friedrich' in content # True
> 
> I've tried adding in the parameters correctly via post
> 
>params = urllib.urlencode([
>  ('params.forzaQuery', 'N'),
> ...
>  ('layout', 'busquedaisbn'),
>  ])
>content = urllib.urlopen(url2, data).read()
> 
> However, this too fails because the underlying engine expects a
> session ID in the URL.  I finally got it to work with the code below:
> 
>import urllib
> 
>data = [
>  ('params.forzaQuery', 'N'),
>  ('params.cdispo', 'A'),
>  ('params.cisbnExt', '8484031128'),
>  ('params.liConceptosExt[0].texto', ''),
>  ('params.orderByFormId', '1'),
>  ('action', 'Buscar'),
>  ('language', 'es'),
>  ('prev_layout', 'busquedaisbn'),
>  ('layout', 'busquedaisbn'),
>  ]
> 
>params = urllib.urlencode(data)
> 
>url =
> 'http://www.mcu.es/webISBN/tituloSimpleDispatch.do;jsessionid=5E8D9A11E4A28BDF0BA6B254D0118262'
> 
>fp = urllib.urlopen(url, params)
>content = fp.read()
>fp.close()
> 
> 
> but I had to hard-code the jsessionid parameter in the URL.  This
> would have to be determined from the initial call & response of
> the initial URL (the initial URL returns a  element with
> the URL to POST to, including this magic jsessionid parameter).
> 
> Hope this helps nudge you (the OP) in the right direction to get
> what you're looking for.
> 
> -tkc
> 
> 
> 
> 
> 
> 
> --
> http://mail.python.org/mailman/listinfo/python-list

OK, Tim, I think you got the point. The jsessionid change in every
response of the initial URL, so I need to read it and stand with it
during the session. Now I must guess how to do it.

Thank you very much to you and also to Chris.
Kind regards,
Toni 

--
http://mail.python.org/mailman/listinfo/python-list