Hi,

I'm using version 0.5.1 of tm package with R 2.10.1.  It looks to me
as if after the following

    reuters21578 <-  Corpus(DirSource(corpusDir), readerControl =
list(reader = readReut21578XMLasPlain))
    reuters21578 <- tm_map(reuters21578, stripWhitespace)
    reuters21578 <- tm_map(reuters21578, tolower)
    reuters21578 <- tm_map(reuters21578, removePunctuation)
    reuters21578 <- tm_map(reuters21578, removeNumbers)
    reuters21578.dtm <- DocumentTermMatrix(reuters21578)

that reuters21578.dtm does not include terms from the Heading (e.g. the Title).

I'm wondering if anyone can confirm this and if so, is there an option
to have the terms from the Heading included?

Many thanks!

Cheers,
David

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to