Am Donnerstag, 31. Juli 2014 23:14:30 UTC+2 schrieb zdenop:
>
> I do not have to time to have a look on this issue yet, but forcing user 
> to use lossless compression is not right way IMO.
> Right way is to implement option for user to force tesseract to use 
> lossless compression, but this feature is not provided by your "patch"...
>
> @zdenop
@jimregan
 
Dear zdenop, dear Jim

yes, thanks. I was thinking about an option --force-lossless-compression , 
but after having inspected the 
http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html 
documentation manual page, I think, that Tesseract does not support (apart 
from a few) command line options, Instead, it (mainly) supports to have 
options in a config file. 

So I will modify my code so that lossless compression can be forced by 
enabling it by means of a switch in the config file. 

Question 1
========

Please can you let me know, if you like my approach (config parameter), or 
if you would also support my proposal for a command line switch 
(--force-lossless-compression).

BTW, it was and is clear to me, that a final patch must not contain 
out-commented (dead) code.


Question 2
========

Where we are at it, I have a question: I may be wrong, but inspecting the 
code I found some pieces indicating a "multi-page" actions. My question: Is 
Tesseract also supporting the OCR-ing of a PDF having many pages ? 

At the moment I have a script (using pdftk/PDFToolkit) to split a PDF into 
single image files, which I then convert one-by-one via Tesseract * * pdf 
option, which I then have to collate again by another script into the final 
single mixed-mode output PDF file. 

Are there initiatives to integrate this into Tesseract ?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ac60dd05-8be6-4117-be43-7cecbdf16272%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to