detection of column breaks and page breaks in PDF document

Robert Rodini Fri, 23 May 2025 08:03:18 -0700

This question is informational.  I use PDFBox utilities to extract text from a 
large PDF file.  The pages of the PDF always contain a three-column format. PDF 
Box CLI utility is wonderful since it processes the columns from top to bottom 
and left to right.


Is there a way to use Apache PDF Box to recognize column breaks (start of a new 
column) and page breaks (start of new page) as the text is being extracted?

Thanks,
Bob

detection of column breaks and page breaks in PDF document

Reply via email to