Re: How to loop over a text file (to remove tags and normalize) using Python

Peter Otten Wed, 10 Mar 2021 04:53:16 -0800

On 10/03/2021 13:19, S Monzur wrote:

I initially scraped the links using beautiful soup, and from those links
downloaded the specific content of the articles I was interested in
(titles, dates, names of contributor, main texts) and stored that
information in a list. I then saved the list to a text file.
https://pastebin.com/8BMi9qjW . I am now trying to remove the html tags
from this text file, and running into issues as mentioned in the previous
post.


As I said in my previous post, when you process the list entries
separately you will probably avoid the problem.

Unfortunately with the format you chose to store your intermediate data
you cannot reconstruct it reliably.

I recommend that you either

(1) avoid the text file and extract the interesting parts from PASoup
directly or

(2) pick a different file format to store the result sets. For
short-term storage pickle
<https://docs.python.org/3/library/pickle.html#examples> should work.

--
https://mail.python.org/mailman/listinfo/python-list

Re: How to loop over a text file (to remove tags and normalize) using Python

Reply via email to