On 20/06/18 20:32, Daniel Bosah wrote: > # coding: latin-1 > from bs4 import BeautifulSoup > from urllib.request import urlopen > import re > > #new point to add... make rest of function then compare a list of monuments > notaries ( such as blvd, road, street, etc.) to a list of words containing > them. if contained, pass into new set ( ref notes in case) > > > def regex(url): > > html = urlopen(url).read() > soup = BeautifulSoup(html,"lxml") # why does lmxl fix it?
Fix what? You haven't given us any clue what you are talking about. Did you have a problem? If so what? And in what way did lmxl fix it? > What this code is doing is basically going through a webpage using > BeautifulSoup and regex to compare a regexed list of words ( in regex ) to > a list of keywords and then writing them to a textfile. The next function > (regexparse) then goes and has a empty list (setss), then reads the > textfile from the previous function. What I want to do, in a for loop, is > check to see if words in monum and the textfile ( from the regex function ) > are shared, and if so , those shared words get added to the empty > list(setss) , then written to a file ( this code is going to be added to a > web crawler, and is basically going to be adding words and phrases to a > txtfile as it crawls through the internet. ). > > However, every time I run the current code, I get all the > textfile(sets.txt) from the previous ( regex ) function, even though all I > want are words and pharse shared between the textfile from regex and the > monum list from regexparse. How can I fix this? So did lmxl fix it? Since you are posting the question I assume not? Can you clarify what exactly you are asking? -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor