I have a problem with the textmatrix() function of the LSA package whenever I specify 'removeNumbers=TRUE'. The data for the function are stored in a directory LSAwork which consists of a series of files that houses the text in column form. As long as removeNumbers = FALSE or it is not present the textmatrix function works just fine. The error message I get seems to suggest it is finding the files empty after filtering. However, all of the files are primarily words with only a few numbers mixed in. Any help appreciated.
The data I am using is the MEDLINE data set and the first file in the data set med.000001 looks like this: correlation between maternal and fetal plasma levels of glucose and free fatty acids . correlation coefficients have been determined between the the command I am using looks like this, with the resulting error below: > > dtm <- textmatrix(LSAwork, stemming=TRUE, stopwords=StopListm, minGlobFreq=1, > minWordLength=2, removeNumbers=TRUE) Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : arguments imply differing number of rows: 1, 0 In addition: Warning message: In FUN(c("LSAWork/med.000001", "LSAWork/med.000002", "LSAWork/med.000003", : [textvector] - the file LSAWork/med.000001 contains no terms after filtering. > Triss Ashton University of North Texas ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.