On Sun, 29 Jun 2014 10:32:00 -0700, subhabangalore wrote: > I am opening multiple URLs with urllib.open, now one Url has huge html > source files, like that each one has. As these files are read I am > trying to concatenate them and put in one txt file as string. > From this big txt file I am trying to take out each html file body of > each URL and trying to write and store them
OK, let me clarify what I think you said. First you concatenate all the web pages into a single file. Then you extract all the page bodies from the single file and save them as separate files. This seems a silly way to do things, why don't you just save each html body section as you receive it? This sounds like it should be something as simple as: from BeautifulSoup import BeautifulSoup import requests urlList = [ "http://something/", "http://something/", "http://something/", ....... ] n = 0 for url in urlList: r = requests.get( url ) soup = BeautifulSoup( r.content ) body = soup.find( "body" ) fp = open( "scraped/body{:0>5d}.htm".format( n ), "w" ) fp.write( body.prettify() ) fp.close n += 1 will give you: scraped/body00000.htm scraped/body00001.htm scraped/body00002.htm ........ for as many urls as you have in your url list. (make sure the target directory exists!) -- Denis McMahon, denismfmcma...@gmail.com -- https://mail.python.org/mailman/listinfo/python-list