This question is informational.  I use PDFBox utilities to extract text from a 
large PDF file.  The pages of the PDF always contain a three-column format. PDF 
Box CLI utility is wonderful since it processes the columns from top to bottom 
and left to right.

Is there a way to use Apache PDF Box to recognize column breaks (start of a new 
column) and page breaks (start of new page) as the text is being extracted?

Thanks,
Bob

Reply via email to