Please don't spam the mailing list. Especially with HTML format messages. See 
the Posting Guide.

PDF is designed to present data graphically. It is literally possible to place 
every character in the page in random order and still achieve this visual 
readability while practically making it nearly impossible to read. I have 
encountered many PDF files with the same text placed on the page multiple 
times... again scrambling your option to read it digitally. Tools like 
"pdftools" can sometimes work when the program that generated the file does so 
in a simple and extraction-friendly way... but there are no guarantees, and 
your description suggests that it is likely that you won't be able to 
accomplish your goal with this file.

On November 19, 2019 11:52:20 PM GMT+01:00, Thomas Subia via R-help 
<r-help@r-project.org> wrote:
>
>Colleagues,
>
> 
>
>I can extract specific data from lines in a pdf using:
>
> 
>
>library(pdftools)
>
>pdf_text("10619.pdf")
>
>txt <- pdf_text(".pdf")
>
>write.table(txt,file="mydata.txt")
>
>con <- file('mydata.txt')
>
>open(con)
>
>serial <- read.table(con,skip=5,nrow=1) #Extract[3]flatness <-
>read.table(con,skip=11,nrow=1)# Extract [5]
>
>parallel1 <-read.table(con,skip=2,nrow=1)# Extract [5]
>
>parallel2 <-read.table(con,skip=4,nrow=1)# Extract [5]
>
>close(con)
>
> 
>
># note here that serial has 4 variables
>
># flatness had 6 variables
>
># parallel1 has 5 variables
>
># parallel2 has 5 variables
>
> 
>
># this outputs the specific data I need
>
>serial[3]
>
>flatness[5]
>
>parallel1[5] # Note here that the txt format shows 0.0007not
>scientific, is there a way to format this to display the original data?
>
>parallel2[5] # Note here that the txt format shows 0.0006not
>scientific, , is there a way to format this to display the original
>data?
>
> 
>
>I'd like to extend this code to all of the pdf files in adirectory and
>to generate a table of all the serial, flatness, parallel1 andparallel2
>data.
>
>I'm not having a lot of success trying to build thescript for this.
>Some pointers would be appreciated.
>All the best.
> 
>Thomas Subia
>
>Statistician / Senior Quality Engineer
>
>
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to