Hello Jean-Philippe, The explanation is that it's a two-column document, i.e. from the geometrical order, numbers 4 and 5 are really on top of 1 and 2ff.
Using pdftotext -layout file.pdf keeps the document layout intact, so you can see the first column on the left side followed by a second column on the right side, the pages are meant to be read from top down on the left, followed by top down on the right, which is indeed confusing, but at least the text order doesn't get messed up in the text representation when the -layout option is present. In order to fix this, i.e. create single-column text, you would need to copy&paste the document column-wise, page by page, which is probably not supported by evince. tesseract can do a column-wise OCR on a scanned document, but converting the file to a picture and then running OCR on it will probably introduce even more errors. Regards -Klaus On Tue, Oct 13, 2015 at 12:55:12AM +0200, MENGUAL Jean-Philippe wrote: > Hi, > > I'm trying to read an European law, and for the 1st time I cannot. It's a > pdf file. It's he!e: > http://demo.accelibreinfo.eu/remit.pdf > > 1. I tried pdftotext > 2. I opened with Evince then Atril, ctrl-a, ctrl-c, ctrl-v in gedit/pluma. > > Without understanding the language, you easily will see that the numbers are > disordered. Instead of (1) (2) (3) (4), the doc starts with (4), (5), then > (1), (2). Confusing. > > An explanation? An idea to fix? What should I do (including !eport a bug > somewhere)? > > Thanks. > > Regards, > > -- > > Jean-Philippe MENGUAL > > HYPRA, progressons ensemble > > Tél.: 01 84 73 06 61 > Mail: cont...@hypra.fr > > Site Web: http://hypra.fr