Yes Piet you were right this works. But seems does not work on google app engine, since it appends it own agent info as seen below
'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13 AppEngine-Google; (+http://code.google.com/appengine)' Any way Thanks . Good to know about the User-Agent field. Jitu On Aug 11, 12:36 am, Piet van Oostrum <p...@cs.uu.nl> wrote: > >>>>> jitu <nair.jiten...@gmail.com> (j) wrote: > >j> Hi, > >j> A html page contains 'anchor' elements with 'href' attribute having > >j> a semicolon in the url , while fetching the page using > >j> urllib2.urlopen, all such href's containing 'semicolons' are > >j> truncated. > >j> For example the > >hrefhttp://travel.yahoo.com/p-travelguide-6901959-pune_restaurants-i;_ylt... > >j> get truncated > >tohttp://travel.yahoo.com/p-travelguide-6901959-pune_restaurants-i > >j> The page I am talking about can be fetched from > >j>http://travel.yahoo.com/p-travelguide-485468-pune_india_vacations-i;_... > > It's not python that causes this. It is the server that sends you the > URLs without these parameters (that's what they are). > > To get them you have to tell the server that you are a respectable > browser. E.g. > > import urllib2 > > url = > 'http://travel.yahoo.com/p-travelguide-6901959-pune_restaurants-i;_ylt... > > url = > 'http://travel.yahoo.com/p-travelguide-485468-pune_india_vacations-i;_... > > hdrs = {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; > rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13', > 'Accept': 'image/*'} > > request = urllib2.Request(url = url, headers = hdrs) > page = urllib2.urlopen(request).read() > > -- > Piet van Oostrum <p...@cs.uu.nl> > URL:http://pietvanoostrum.com[PGP 8DAE142BE17999C4] > Private email: p...@vanoostrum.org -- http://mail.python.org/mailman/listinfo/python-list