Thank you Samuel, What are the bash commands to make pdfcrop and pdftk work? Thanks.
Regards Henry Samuel Thibault <sthiba...@debian.org> 於 2022年10月27日 週四 中午12:27寫道: > Henry Chang, le jeu. 27 oct. 2022 12:08:20 -0400, a ecrit: > > I found that the original 11470644.pdf is formatted in two columns. The > texts > > on a line of the first column messed up with the texts on the line of the > > second column at the same position. > > Perhaps you can use pdfcrop and pdftk to split pages into the left and > the right parts, and join then together again in a single pdf file that > you can feed to tesseract. > > Samuel > -- Muchiu (Henry) Chang, PhD. Cantab Patent Mapping Intelligence Researcher & Monte Carlo Modeling Simulation Expert https://www.linkedin.com/in/mcc212/ <http://www.slideshare.net/mcc212> tel. +1-416-828-5676