And a warning to the OP... PDF files are like packages.... a wide variety of things can be inside, including text in semi-random order, or bitmap images of text... so having a tool that extracts text from the file will only be of use if your PDF files happen to be of the type that contain reasonably unscrambled text. -- Sent from my phone. Please excuse my brevity.
On January 23, 2018 11:35:38 PM PST, Ulrik Stervbo <ulrik.ster...@gmail.com> wrote: >I think I would use pdftk to extract the form data. All subsequent >manipulation in R. > >HTH >Ulrik > >Eric Berger <ericjber...@gmail.com> schrieb am Mi., 24. Jan. 2018, >08:11: > >> Hi Scott, >> I have never done this myself but I read something recently on the >> r-help distribution that was related. >> I just did a quick search and found a few hits that might work for >you. >> >> 1. >> >https://medium.com/@CharlesBordet/how-to-extract-and-clean-data-from-pdf-files-in-r-da11964e252e >> 2. http://bxhorn.com/2016/extract-data-tables-from-pdf-files-in-r/ >> 3. >> >https://www.rdocumentation.org/packages/textreadr/versions/0.7.0/topics/read_pdf >> >> HTH, >> Eric >> >> On Wed, Jan 24, 2018 at 3:58 AM, Scott Clausen <scottclau...@mac.com> >> wrote: >> > Hello, >> > >> > I’m new to R and am using it with RStudio to learn the language. >I’m >> doing so as I have quite a lot of traffic data I would like to >explore. My >> problem is that all the data is located on a number of PDFs. Can >someone >> point me to info on gathering data from other sources? I’ve been to >the R >> FAQ and didn’t see anything and would appreciate your thoughts. >> > >> > I am quite sure now that often, very often, in matters concerning >> religion and politics a man's reasoning powers are not above the >monkey's. >> > >> > -- Mark Twain >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.