This question is informational. I use PDFBox utilities to extract text from a large PDF file. The pages of the PDF always contain a three-column format. PDF Box CLI utility is wonderful since it processes the columns from top to bottom and left to right.
Is there a way to use Apache PDF Box to recognize column breaks (start of a new column) and page breaks (start of new page) as the text is being extracted? Thanks, Bob