Google News used to fail with the high level functions provided by httplib and the like. However, I found this piece of code somewhere:

    def gopen():
      http = httplib.HTTPSConnection('news.google.com')
      http.request("GET","/news?ned=es_MX" ,
                    headers =
{"User-Agent":"Mozilla/5.0 (X11; U; Linux i686; es-MX) AppleWebKit/532.8 (KHTML, like Gecko) Chrome/4.0.277.0 Safari/532.8",
                   "Host":'news.google.com',
                   "Accept": "*/*"})
      return http.getresponse()

A few days ago, Google News has been revamped and it doesn't work any more (2.6/Win7, 2.7/OSX and, with minimal changes, 3.6/Win7), because the page contents is empty. The code itself doesn't raise any errors. Which is the proper way to do it now? I must stick to the standard libraries.

The returned headers are:

----------------------
    [('Content-Type', 'application/binary'),
     ('Cache-Control', 'no-cache, no-store, max-age=0, must-revalidate'),
     ('Pragma', 'no-cache'),
     ('Expires', 'Mon, 01 Jan 1990 00:00:00 GMT'),
     ('Date', 'Thu, 13 Jul 2017 16:37:48 GMT'),
     ('Location', 'https://news.google.com/news/?ned=es_mx&hl=es'),
     ('Strict-Transport-Security', 'max-age=10886400'),
     ('P3P',
      'CP="This is not a P3P policy! See '
'https://support.google.com/accounts/answer/151657?hl=en for more info."'),
     ('Server', 'ESF'),
     ('Content-Length', '0'),
     ('X-XSS-Protection', '1; mode=block'),
     ('X-Frame-Options', 'SAMEORIGIN'),
     ('X-Content-Type-Options', 'nosniff'),
('Set-Cookie', 'NID=107=qwH7N2hB12zVGfFzrAC2CZZNhrnNAVLEmTvDvuSzzw6mSlta9D2RDZVP9t5gEcq_WJjZQjDSWklJ7LElSnAZnHsiF4CXOwvGDs2tjrXfP41LE-6LafdA86GO3sWYnfWs;Domain=.google.com;Path=/;Expires=Fri, '
     '12-Jan-2018 16:37:48 GMT;HttpOnly'),
     ('Alt-Svc', 'quic=":443"; ma=2592000; v="39,38,37,36,35"')]
-----------------------

`read()` is empty string ('' or b''). `status` is 302. `reason` is `Found`.

Javier
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to