Dear all,
I have some troubles using the stemming algorithm provided by the tm
(text mining) + Snowball packages.
Here is my config:
MacOS 10.5
R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions)
I have installed all the needed packages (tm, rJava, rWeka, Snowball)
+ dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html)
with :
Sys.setenv(NOAWT=TRUE)
The command tm_map(reuters, stemDocument) gives the following errors :
- First time:
Error in .jnew(name) :
java.lang.InternalError: Can't start the AWT because Java was
started on the first thread. Make sure StartOnFirstThread is not
specified in your application's Info.plist or on the command line
Refreshing GOE props...
- Second time:
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
(etc.)
I have already search the Web for a solution, but I have found nothing
useful.
Here is the full source code (all the librairies are already loaded):
------
Sys.setenv(NOAWT=TRUE)
source <- ReutersSource("reuters-21578.xml", encoding="UTF-8")
reuters <- Corpus(source)
reuters <- tm_map(reuters, as.PlainTextDocument)
reuters <- tm_map(reuters, removePunctuation)
reuters <- tm_map(reuters, tolower)
reuters <- tm_map(reuters, removeWords, stopwords("english"))
reuters <- tm_map(reuters, removeNumbers)
reuters <- tm_map(reuters, stripWhitespace)
reuters <- tm_map(reuters, stemDocument)
------
Thank you for your help,
Julien
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.