I am using code that previously worked to remove stopwords using package
"tm". Even manually adding "the" to the list does not work to remove "the".
This package has undergone extensive redevelopment with changes to the
function syntax, so perhaps I am just missing something.

Please see my simple example, output, and sessionInfo() below.

Thanks!
Mark

require(tm)
myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and
jill ran up the hill", "to fetch a pail of water")
text.corp <- Corpus(VectorSource(myDocument))
#########################
text.corp <- tm_map(text.corp, stripWhitespace)
text.corp <- tm_map(text.corp, removeNumbers)
text.corp <- tm_map(text.corp, removePunctuation)
## text.corp <- tm_map(text.corp, stemDocument)
text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english")))
dtm <- DocumentTermMatrix(text.corp)
dtm
dtm.mat <- as.matrix(dtm)
dtm.mat

> dtm.mat
    Terms
Docs falls fetch hill jack jill mainly pail plain rain ran spain the water
   1     0     0    0    0    0      0    0     0    1   0     1   1     0
   2     1     0    0    0    0      1    0     1    0   0     0   0     0
   3     0     0    1    1    1      0    0     0    0   1     0   0     0
   4     0     1    0    0    0      0    1     0    0   0     0   0     1

R version 2.10.0 Patched (2009-10-27 r50222)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] chron_2.3-33 RWeka_0.3-23 tm_0.5-1

loaded via a namespace (and not attached):
[1] grid_2.10.0  rJava_0.8-1  slam_0.1-6   tools_2.10.0


Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to