Umesh Sharma wrote: > Hello, > > I am writing a crawler in python, which crawl quora. I can't read the content > of quora without login. But google/bing crawls quora. One thing i can do is > use browser automation and login in my account and the go links by link and > crawl content, but this method is slow. So can any one tell me how should i > start in writing this crawler. > > I had never heard of quora. And I had to hunt a bit to find a link to this website. When you post a question here which refers to a non-Python site, you really should include a link to it.
You start with reading the page: http://www.quora.com/about/tos which you agreed to when you created your account with them. At one place it seems pretty clear that unless you make specific arrangements with Quora, you're limited to using their API. I suspect that they bend over backwards to get Google and the other big names to index their stuff. But that doesn't make it legal for you to do the same. In particular, the section labeled "Rules" makes constraints on automated crawling. And so do other parts of the TOS. Crawling is permissible, but not scraping. What's that mean? I dunno. Perhaps scraping is what you're describing above as "method is slow." I'm going to be looking to see what API's they offer, if any. I'm creating an account now. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list