Hey Crusier/ (And Others...) For your site...
As Alan mentioned, its a mix of html/jscript/etc.. So, you're going (or perhaps should) need to extract just the json/struct that you need, and then go from there. I speak of experience, as I've had to hande a number of sites that are essentially just what you have. Here's a basic guide to start: --I use libxml, simplejson fetch the page in the page, do a split, to get the exact json (string) that you want. -you'll do to splits, 1st gets rid of extra pre json stuff 2nd gets rid of extra post json stuf that you don't need --at this point, you should have the json string you need, or you should be pretty close.. -now, you might need to "pretty" up what you have as py/json only accepts key/value in certain format single/double quotes, etc.. once you've gotten this far, you might actually have the json string, in which case, you can load it directly into the json, and proceed as you wish. you might also find that what you have, is really a py dictionary, and you can handle that as well! Have fun, let us know if you have issues... On Sun, Dec 13, 2015 at 2:44 AM, Crusier <crus...@gmail.com> wrote: > Dear All, > > I am trying to scrap the following website, however, I have > encountered some problems. As you can see, I am not really familiar > with regex and I hope you can give me some pointers to how to solve > this problem. > > I hope I can download all the transaction data into the database. > However, I need to retrieve it first. The data which I hope to > retrieve it is as follows: > > " > 15:59:59 A 500 6.790 3,395 > 15:59:53 B 500 6.780 3,390................ > > Thank you > > Below is my quote: > > from bs4 import BeautifulSoup > import requests > import re > > url = > 'https://bochk.etnet.com.hk/content/bochkweb/eng/quote_transaction_daily_history.php?code=6881&time=F&timeFrom=090000&timeTo=160000&turnover=S&sessionId=44c99b61679e019666f0570db51ad932&volMin=0&turnoverMin=0' > > def turnover_detail(url): > response = requests.get(url) > html = response.content > soup = BeautifulSoup(html,"html.parser") > data = soup.find_all("script") > for json in data: > print(json) > > turnover_detail(url) > > Best Regards, > Henry > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor