Thanks all for your help. I fear text mining is an abstract little corner of "R".
I have imported 3228 text (.txt) files, each a news story, into R using [tm]: textd <- Corpus(DirSource("other/docs"), readerControl = list(reader =readPlain)) I can pre-process each individual document using tolower(textd[[1]]) however, when I try to run tmTolower() I get a no such command error, and then the Term Document Matrix command gives me a peculiar error: > other.TDM <- TermDocumentMatrix(textd, control = list(stopwords = TRUE)) Error in tolower(txt) : invalid input 'Valentino bag, breakfasting at West Palm Beach café Testa . . . VALENTINO, in' in 'utf8towcs' > Is it something to do with the structure of the documents I've read in. The "tm" documentation is *extremely* abstract, at my Neanderthal level. Thanks to anyone who can help -- View this message in context: http://r.789695.n4.nabble.com/Help-using-tm-text-mining-package-preprocessing-tp3299399p3299399.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.