On 10/03/2021 13:19, S Monzur wrote:
I initially scraped the links using beautiful soup, and from those links downloaded the specific content of the articles I was interested in (titles, dates, names of contributor, main texts) and stored that information in a list. I then saved the list to a text file. https://pastebin.com/8BMi9qjW . I am now trying to remove the html tags from this text file, and running into issues as mentioned in the previous post.
As I said in my previous post, when you process the list entries separately you will probably avoid the problem. Unfortunately with the format you chose to store your intermediate data you cannot reconstruct it reliably. I recommend that you either (1) avoid the text file and extract the interesting parts from PASoup directly or (2) pick a different file format to store the result sets. For short-term storage pickle <https://docs.python.org/3/library/pickle.html#examples> should work. -- https://mail.python.org/mailman/listinfo/python-list