[R] Extract lines from pdf files

Thomas Subia via R-help Tue, 19 Nov 2019 14:53:27 -0800


Colleagues,


 

I can extract specific data from lines in a pdf using:

 

library(pdftools)

pdf_text("10619.pdf")

txt <- pdf_text(".pdf")

write.table(txt,file="mydata.txt")

con <- file('mydata.txt')

open(con)

serial <- read.table(con,skip=5,nrow=1) #Extract[3]flatness <- 
read.table(con,skip=11,nrow=1)# Extract [5]

parallel1 <-read.table(con,skip=2,nrow=1)# Extract [5]

parallel2 <-read.table(con,skip=4,nrow=1)# Extract [5]

close(con)

 

# note here that serial has 4 variables

# flatness had 6 variables

# parallel1 has 5 variables

# parallel2 has 5 variables

 

# this outputs the specific data I need

serial[3]

flatness[5]

parallel1[5] # Note here that the txt format shows 0.0007not scientific, is 
there a way to format this to display the original data?

parallel2[5] # Note here that the txt format shows 0.0006not scientific, , is 
there a way to format this to display the original data?

 

I'd like to extend this code to all of the pdf files in adirectory and to 
generate a table of all the serial, flatness, parallel1 andparallel2 data.

I'm not having a lot of success trying to build thescript for this. Some 
pointers would be appreciated.
All the best.
 
Thomas Subia

Statistician / Senior Quality Engineer



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extract lines from pdf files

Reply via email to