I want to extract text from a PDF and preserve any table or at least 
convert it to a CSV. I am using the PDFtoText package (which uses the 
Poppler software). The text is extracted vertically (i.e. one column at a 
time) and each text is separated by a space. There is no line break making 
it difficult to manipulate. I want to extract the text horizontally to 
preserve and possible add line breaks to allow for further manipulation.

Your help in this matter is appreciated. Suggest alternatives if available.

Here is the Go code:

package main

import (
    "fmt"
    "log"
    "os"

    pdftotext "github.com/heussd/pdftotext-go"
)

func main() {
    // Replace "test.pdf" with the path to your PDF file
    pdfPath := "test.pdf"
    // Open the PDF file
    f, err := os.Open(pdfPath)
    if err != nil {
        log.Fatalf("Failed to open PDF file: %v", err)
    }
    defer f.Close()
    // Read the file content
    content, err := os.ReadFile(pdfPath)
    if err != nil {
        log.Fatalf("Failed to read PDF file: %v", err)
    }
    // Extract text from the PDF file
    text, err := pdftotext.Extract(content)
    if err != nil {
        log.Fatalf("Failed to extract text from PDF file: %v", err)
    }
    // Print the extracted text
    fmt.Println(text)
}

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/c19e212d-a81f-4525-ae0d-a9abb0b292fbn%40googlegroups.com.

Reply via email to