Hi Hugh,

I have been planning to do some Go work with PDF files, so your email triggered 
me to do some research.

Not sure it using heussd/pdftotext-go is critical to you, or if you are just 
trying to read text in a PDF?  I tried to get pdf2text installed but my dev 
laptop is still running macOS Monterey and I couldn't get it working so I 
looked for other options.

If you are just interested in reading PDF text and do not have a specific need 
to use pdf2text then one those others I looked at might work. I came across a 
package originally developed by Russ Cox that was forked by many others, and to 
evaluate it I forked one of those and then converted it from using a reader to 
returning a slice of strings so I could easily split out the new lines. (I 
could probably have make it work with the reader, but I was just going for 
quick.)

If you think it can help your use-case, please check it out (but be aware, my 
additions to the forked code are rather hacky):

https://github.com/mikeschinkel/go-pdf-content-reader 
<https://github.com/mikeschinkel/go-pdf-content-reader>

-Mike

> On Jan 22, 2025, at 11:08 AM, Hugh Myrie <hugh.my...@gmail.com> wrote:
> 
> I want to extract text from a PDF and preserve any table or at least convert 
> it to a CSV. I am using the PDFtoText package (which uses the Poppler 
> software). The text is extracted vertically (i.e. one column at a time) and 
> each text is separated by a space. There is no line break making it difficult 
> to manipulate. I want to extract the text horizontally to preserve and 
> possible add line breaks to allow for further manipulation.
> 
> Your help in this matter is appreciated. Suggest alternatives if available.
> 
> Here is the Go code:
> 
> package main
> 
> import (
>     "fmt"
>     "log"
>     "os"
> 
>     pdftotext "github.com/heussd/pdftotext-go"
> )
> 
> func main() {
>     // Replace "test.pdf" with the path to your PDF file
>     pdfPath := "test.pdf"
>     // Open the PDF file
>     f, err := os.Open(pdfPath)
>     if err != nil {
>         log.Fatalf("Failed to open PDF file: %v", err)
>     }
>     defer f.Close()
>     // Read the file content
>     content, err := os.ReadFile(pdfPath)
>     if err != nil {
>         log.Fatalf("Failed to read PDF file: %v", err)
>     }
>     // Extract text from the PDF file
>     text, err := pdftotext.Extract(content)
>     if err != nil {
>         log.Fatalf("Failed to extract text from PDF file: %v", err)
>     }
>     // Print the extracted text
>     fmt.Println(text)
> }
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com 
> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
> To view this discussion visit 
> https://groups.google.com/d/msgid/golang-nuts/c19e212d-a81f-4525-ae0d-a9abb0b292fbn%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/golang-nuts/c19e212d-a81f-4525-ae0d-a9abb0b292fbn%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/86F9E39B-789B-4D39-8AB1-3C3A20367035%40newclarity.net.

Reply via email to