Greetings to all, I have a similar issue with Snowball. I am runing R version 2.12.1 (2010-12-16) on windows 7
Here is my script : ---- library(tm) custom.xml <- system.file("texts", "custom.xml", package = "tm") print(readLines(custom.xml), quote = FALSE) myXMLReader <- readXML( spec = list( Language = list("node", "/document/language"), DateTimeStamp = list("node", "/document/date"), Origin = list("node", "/document/source"), Description = list("node", "/document/subject"), Type = list("node", "/document/country"), Heading = list("node", "/document/title"), Content = list("node", "/document/contenu"), Author = list("node", "/document/author")), doc = PlainTextDocument()) mySource <- function(x, encoding = "UTF-8") XMLSource(x, function(tree) XML::xmlRoot(tree)$children, myXMLReader, encoding) corpusmf <- Corpus(mySource(custom.xml)) meta(corpusmf[[1]]) meta(corpusmf[[2]]) corpusmf <- tm_map(corpusmf, stripWhitespace) corpusmf <- tm_map(corpusmf, removeNumbers) corpusmf <- tm_map(corpusmf, removePunctuation) corpusmf <- tm_map(corpusmf,stemDocument) matrix <- TermDocumentMatrix(corpusmf,control=list(weighting =weightBin )) print(matrix) ----- stemDocument returns an error message : Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! I tried to invoke library(Snowball) before, but it's the same. I found a clue on Weka website http://weka.wikispaces.com/The+snowball+stemmers+don%27t+work,+what+am+I+doing+wrong%3F but I don't understand what I should do with this archives I would be grateful if someone could help on this; Kind regards, -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-Snowball-RWeka-tp3402126p3569089.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.