Am Donnerstag, 31. Juli 2014 23:14:30 UTC+2 schrieb zdenop: > > I do not have to time to have a look on this issue yet, but forcing user > to use lossless compression is not right way IMO. > Right way is to implement option for user to force tesseract to use > lossless compression, but this feature is not provided by your "patch"... > > @zdenop @jimregan Dear zdenop, dear Jim
yes, thanks. I was thinking about an option --force-lossless-compression , but after having inspected the http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html documentation manual page, I think, that Tesseract does not support (apart from a few) command line options, Instead, it (mainly) supports to have options in a config file. So I will modify my code so that lossless compression can be forced by enabling it by means of a switch in the config file. Question 1 ======== Please can you let me know, if you like my approach (config parameter), or if you would also support my proposal for a command line switch (--force-lossless-compression). BTW, it was and is clear to me, that a final patch must not contain out-commented (dead) code. Question 2 ======== Where we are at it, I have a question: I may be wrong, but inspecting the code I found some pieces indicating a "multi-page" actions. My question: Is Tesseract also supporting the OCR-ing of a PDF having many pages ? At the moment I have a script (using pdftk/PDFToolkit) to split a PDF into single image files, which I then convert one-by-one via Tesseract * * pdf option, which I then have to collate again by another script into the final single mixed-mode output PDF file. Are there initiatives to integrate this into Tesseract ? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ac60dd05-8be6-4117-be43-7cecbdf16272%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.