Colleagues,
I can extract specific data from lines in a pdf using: library(pdftools) pdf_text("10619.pdf") txt <- pdf_text(".pdf") write.table(txt,file="mydata.txt") con <- file('mydata.txt') open(con) serial <- read.table(con,skip=5,nrow=1) #Extract[3]flatness <- read.table(con,skip=11,nrow=1)# Extract [5] parallel1 <-read.table(con,skip=2,nrow=1)# Extract [5] parallel2 <-read.table(con,skip=4,nrow=1)# Extract [5] close(con) # note here that serial has 4 variables # flatness had 6 variables # parallel1 has 5 variables # parallel2 has 5 variables # this outputs the specific data I need serial[3] flatness[5] parallel1[5] # Note here that the txt format shows 0.0007not scientific, is there a way to format this to display the original data? parallel2[5] # Note here that the txt format shows 0.0006not scientific, , is there a way to format this to display the original data? I'd like to extend this code to all of the pdf files in adirectory and to generate a table of all the serial, flatness, parallel1 andparallel2 data. I'm not having a lot of success trying to build thescript for this. Some pointers would be appreciated. All the best. Thomas Subia Statistician / Senior Quality Engineer [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.