I want to extract text from a PDF and preserve any table or at least convert it to a CSV. I am using the PDFtoText package (which uses the Poppler software). The text is extracted vertically (i.e. one column at a time) and each text is separated by a space. There is no line break making it difficult to manipulate. I want to extract the text horizontally to preserve and possible add line breaks to allow for further manipulation.
Your help in this matter is appreciated. Suggest alternatives if available. Here is the Go code: package main import ( "fmt" "log" "os" pdftotext "github.com/heussd/pdftotext-go" ) func main() { // Replace "test.pdf" with the path to your PDF file pdfPath := "test.pdf" // Open the PDF file f, err := os.Open(pdfPath) if err != nil { log.Fatalf("Failed to open PDF file: %v", err) } defer f.Close() // Read the file content content, err := os.ReadFile(pdfPath) if err != nil { log.Fatalf("Failed to read PDF file: %v", err) } // Extract text from the PDF file text, err := pdftotext.Extract(content) if err != nil { log.Fatalf("Failed to extract text from PDF file: %v", err) } // Print the extracted text fmt.Println(text) } -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/c19e212d-a81f-4525-ae0d-a9abb0b292fbn%40googlegroups.com.