On Thu, Nov 12, 2009 at 11:29:50AM -0500, Mark Kimpel wrote: > I am using code that previously worked to remove stopwords using package "tm".
Thanks for reporting. This is a bug in the removeWords() function in tm version 0.5-1 available from CRAN: > require(tm) > myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and > jill ran up the hill", "to fetch a pail of water") > text.corp <- Corpus(VectorSource(myDocument)) > ######################### > text.corp <- tm_map(text.corp, stripWhitespace) > text.corp <- tm_map(text.corp, removeNumbers) > text.corp <- tm_map(text.corp, removePunctuation) > ## text.corp <- tm_map(text.corp, stemDocument) > text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) > dtm <- DocumentTermMatrix(text.corp) > dtm > dtm.mat <- as.matrix(dtm) > dtm.mat > > > dtm.mat > Terms > Docs falls fetch hill jack jill mainly pail plain rain ran spain the water > 1 0 0 0 0 0 0 0 0 1 0 1 1 0 > 2 1 0 0 0 0 1 0 1 0 0 0 0 0 > 3 0 0 1 1 1 0 0 0 0 1 0 0 0 > 4 0 1 0 0 0 0 1 0 0 0 0 0 1 The function removeWords() fails to remove patterns at the beginning or at the end of a line. This bug is fixed in the latest development version on R-Forge, and the fix will be included in the next CRAN release. Please see https://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/inst/NEWS?root=tm&view=markup for a list of all bug fixes and changes between each tm version. Best regards, Ingo Feinerer -- Ingo Feinerer Vienna University of Technology http://www.dbai.tuwien.ac.at/staff/feinerer ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.