The function extract 
https://pkg.go.dev/github.com/heussd/pdftotext-go#Extract actually says:  
Extract 
PDF text content in simplified format
That might mean it will return text only and not tables /etc. You might 
find a better support if you raise a git issue 
in: https://github.com/heussd/pdftotext-go/issues as an idea for getting 
more information
Also, LLM like gemini or chatGpt might get you a good direction:
* https://g.co/gemini/share/e146a8428b63
* https://chatgpt.com/share/679166c7-3c9c-8008-9d21-e97ae2d90f4a

On Wednesday, January 22, 2025 at 10:08:51 AM UTC-6 Hugh Myrie wrote:

> I want to extract text from a PDF and preserve any table or at least 
> convert it to a CSV. I am using the PDFtoText package (which uses the 
> Poppler software). The text is extracted vertically (i.e. one column at a 
> time) and each text is separated by a space. There is no line break making 
> it difficult to manipulate. I want to extract the text horizontally to 
> preserve and possible add line breaks to allow for further manipulation.
>
> Your help in this matter is appreciated. Suggest alternatives if available.
>
> Here is the Go code:
>
> package main
>
> import (
>     "fmt"
>     "log"
>     "os"
>
>     pdftotext "github.com/heussd/pdftotext-go"
> )
>
> func main() {
>     // Replace "test.pdf" with the path to your PDF file
>     pdfPath := "test.pdf"
>     // Open the PDF file
>     f, err := os.Open(pdfPath)
>     if err != nil {
>         log.Fatalf("Failed to open PDF file: %v", err)
>     }
>     defer f.Close()
>     // Read the file content
>     content, err := os.ReadFile(pdfPath)
>     if err != nil {
>         log.Fatalf("Failed to read PDF file: %v", err)
>     }
>     // Extract text from the PDF file
>     text, err := pdftotext.Extract(content)
>     if err != nil {
>         log.Fatalf("Failed to extract text from PDF file: %v", err)
>     }
>     // Print the extracted text
>     fmt.Println(text)
> }
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/20f4feb9-f392-4bf8-b9f9-129f6270704dn%40googlegroups.com.

Reply via email to