checkout the 'officer' package Thanks
Jim Holtman *Data Munger Guru* *What is the problem that you are trying to solve?Tell me what you want to do, not how you want to do it.* On Fri, Dec 29, 2023 at 10:14 AM Andy <phaedr...@gmail.com> wrote: > Hello > > I am trying to work through a problem, but feel like I've gone down a > rabbit hole. I'd very much appreciate any help. > > The task: I have several directories of multiple (some directories, up > to 2,500+) *.docx files (newspaper articles downloaded from Lexis+) that > I want to iterate through to append to a spreadsheet only those articles > that satisfy a condition (i.e., a specific keyword is present for >= 50% > coverage of the subject matter). Lexis+ has a very specific structure > and keywords are given in the row "Subject". > > I'd like to be able to accomplish the following: > > (1) Append the title, the month, the author, the number of words, and > page number(s) to a spreadsheet > > (2) Read each article and extract keywords (in the docs, these are > listed in 'Subject' section as a list of keywords with a percentage > showing the extent to which the keyword features in the article (e.g., > FAST FASHION (72%)) and to append the keyword and the % coverage to the > same row in the spreadsheet. However, I want to ensure that the keyword > coverage meets the threshold of >= 50%; if not, then pass onto the next > article in the directory. Rinse and repeat for the entire directory. > > So far, I've tried working through some Stack Overflow-based solutions, > but most seem to use the textreadr package, which is now deprecated; > others use either the officer or the officedown packages. However, these > packages don't appear to do what I want the program to do, at least not > in any of the examples I have found, nor in the vignettes and relevant > package manuals I've looked at. > > The first point is, is what I am intending to do even possible using R? > If it is, then where do I start with this? If these docx files were > converted to UTF-8 plain text, would that make the task easier? > > I am not a confident coder, and am really only just getting my head > around R so appreciate a steep learning curve ahead, but of course, I > don't know what I don't know, so any pointers in the right direction > would be a big help. > > Many thanks in anticipation > > Andy > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.