Thanks Ingo.
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please
On Sun, Nov 15, 2009 at 11:05 AM, Ingo Feinerer wrote:
On Thu, Nov 12, 2009 at 11:29:50AM -0500, Mark Kimpel wrote:
> I am using code that previously worked to remove stopwords using package "tm".
Thanks for reporting. This is a bug in the removeWords() function in
tm version 0.5-1 available from CRAN:
> require(tm)
> myDocument <- c("the rain in Spa
Mark,
It looks like removeWords removed "the" in all instances except when
"the" was the first word in your text. Maybe there is a parameter that
needs to be set? I couldn't find anything on the help page.
Here's an example of what I am seeing using the "crude" dataset
#function re
Sam,
Thanks for the example. Removing stop words after the DocumentTermMatrix has
been created works fine if one is working with single words, but what if one
is creating a dtm of possible combinations of words? Wouldn't one want to
remove them from the corpus?
Mark
Mark W. Kimpel MD ** Neuroin
I'm not sure what's wrong with your approach, but this seems to strip
"the"
require(tm)
params <- list(minDocFreq = 1,
removeNumbers = TRUE,
stemming = TRUE,
stopwords = TRUE,
I am using code that previously worked to remove stopwords using package
"tm". Even manually adding "the" to the list does not work to remove "the".
This package has undergone extensive redevelopment with changes to the
function syntax, so perhaps I am just missing something.
Please see my simple
6 matches
Mail list logo