Crusier wrote: > I have recently finished reading "Starting out with Python" and I > really want to do some web scraping. Please kindly advise where I can > get more information about BeautifulSoup. It seems that Documentation > is too hard for me. If you tell us what you don't understand and what you want to achieve we may be able to help you.
> from bs4 import BeautifulSoup > import urllib.request > > HKFile = > urllib.request.urlopen("https://bochk.etnet.com.hk/content/bochkweb/tc/quote_transaction_daily_history.php?code=2388") > HKHtml = HKFile.read() > HKFile.close() > > print(HKFile) > Furthermore, I have tried to scrap this site but it seems that there > is an error (<http.client.HTTPResponse object at 0x02C09F90>). That's not an error, that's what urlopen() returns. If an error occurs Python libraries are usually explicit an throw an exception. If the exception is not handled by your script by default Python prints a traceback and exits. For example: >>> import urllib.request >>> urllib.request.urlopen("http://httpbin.org/status/404") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.4/urllib/request.py", line 469, in open response = meth(req, response) File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.4/urllib/request.py", line 507, in error return self._call_chain(*args) File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain result = func(*args) File "/usr/lib/python3.4/urllib/request.py", line 587, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: NOT FOUND That's what a well-behaved error looks like ;) > Please advise what I should do in order to overcome this. If you want to print the contents of the page just replace the line > print(HKFile) in your code with print(HKHtml.decode("utf-8")) _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor