Hello all, I have asked this question on many forums without response. And although I've made progress myself, I am stuck as to how to respond to a particular error message.
I have a question about text-analysis packages and code. The general idea is that I am trying to perform readability analyses on a collection of about 4,000 Word files. I would like to do any of a number of such analyses, but the problem now is getting R to recognize the uploaded files as data ready for analysis. But I have been getting error messages. Let me show what I have done so far. I have three separate commands because I broke the file of 4,000 files up into three separate ones because, evidently, the file was too voluminous to be read alone in its entirety. So, I divided the files up into three roughly similar folders. They are called ‘WPSCASES’ one through three. Here is my code, with the error messages for each command recorded below: token <- tokenize("/Users/Gordon/Desktop/WPSCASESONE/",lang="en",doc_id="sample") The code is the same for the other folders; the name of the folder is different, but otherwise identical. The error message reads: *Error in nchar(tagged.text[, "token"], type = "width") : invalid multibyte string, element 348* The error messages are the same for the other two commands. But the 'element' number is different. It's 925 for the second folder, and 4302 for the third. token2 <- tokenize("/Users/Gordon/Desktop/WPSCASES2/",lang="en",doc_id="sample") token3 <- tokenize("/Users/Gordon/Desktop/WPSCASES3/",lang="en",doc_id="sample") These are the other commands if that's helpful. I’ve tried to discover whether the ‘element’ that the error message mentions corresponds to the file of that number in the file’s order. But since folder 3 does not have 4,300 files in it, I think that that was unlikely. Please let me know if you can figure out how to fix this stuff so that I can start to use ‘koRpus’ commands, like ‘readability’ and its progeny. Thank you, Gordon [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.