Re: Wikipedia XML Dump

2014-01-28 Thread Kevin Glover
Thanks for the comments, guys. The Wikipedia download is a single XML document, 43.1GB. Any further thoughts? Kevin -- https://mail.python.org/mailman/listinfo/python-list

Exclude text within quotation marks and words beginning with a capital letter

2015-12-01 Thread Kevin Glover
I am working on a program that is written in Python 2.7 to be compatible with the POS tagger that I import from Pattern. The tagger identifies all the nouns in a text. I need to exclude from the tagger any text that is within quotation marks, and also any word that begins with an upper case lett

Insert variable into text search string

2014-02-19 Thread Kevin Glover
These two lines are from a program that carries out a phrase search of Wikipedia and returns the total number of times that the specific phrase occurs. It is essential that the search contains an apostrophe: results = w.search("\"of the cat's\"", type=ALL, start=1, count=1) print results.total

Re: Insert variable into text search string

2014-02-19 Thread Kevin Glover
Thank you both so much. I had tried % but not successfully. -- https://mail.python.org/mailman/listinfo/python-list