If you did not installed osd[1] datafile it is a config bug???

[1]
https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.01.osd.tar.gz

Zdenko


On Fri, Feb 28, 2014 at 5:09 PM, Bernard Polarski <[email protected]>wrote:

> Thanks for the tip !
>
> I see a file 'pdf' in tessdata/configs  with 2 values in it  :
>
> tessedit_create_pdf 1
> tessedit_pageseg_mode 1
>
>
> Sound like  this 'tessedit_pageseg_mode 1' parameter tells tesseract to
> include the hocr. (I produced one with the radical name of the output file
> In all case it worked.
>
> I had an issue with Tesseract complaining for a file named
> osd.traineddata. I copied the eng.tesseract onto this name  and it was ok.
> Sound like a config bug, I have no idea where it comes from.
>
>
>
>
> Le vendredi 28 février 2014 15:53:07 UTC+1, Quan Nguyen a écrit :
>
>> I use:
>>
>> tesseract.exe imagefile outfile pdf
>>
>> On Friday, February 28, 2014 4:57:57 AM UTC-6, Bernard Polarski wrote:
>>>
>>> Indeed and I am currently exploring this. I did compile the 3.03 in
>>> Cygwin ( had to remove this -std=c++11 flag of CXXFLAGS from configure and
>>> configure.ac ).
>>> I ended with a set of binaries in /usr/local/bin
>>>
>>> rwxr-xr-x 1 P0957 Domain Users   68047 Feb 26 12:46 convertfilestopdf.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   65424 Feb 26 12:46 convertfilestops.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   69965 Feb 26 12:46 convertformat.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   70510 Feb 26 12:46
>>> convertsegfilestopdf.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   66500 Feb 26 12:46
>>> convertsegfilestops.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   63798 Feb 26 12:46 converttopdf.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   65555 Feb 26 12:46 converttops.exe
>>> -rwxr-xr-x 1 P0957 Domain Users 6585300 Feb 26 12:46 cyglept-4.dll
>>> -rwxr-xr-x 1 P0957 Domain Users   76194 Feb 26 12:46 fileinfo.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   69640 Feb 26 12:46 printimage.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   73276 Feb 26 12:46 printsplitimage.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   63738 Feb 26 12:46 printtiff.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   69765 Feb 26 12:46 splitimage2pdf.exe
>>> -rwxr-xr-x 1 P0957 Domain Users 3208652 Feb 27 10:00 tesseract.exe
>>> -rwxr-xr-x 1 P0957 Domain Users   76794 Feb 26 12:46 xtractprotos.exe
>>> I did not find any documentation yet on these. At last resort, I will
>>> have to review the C code of each in order to figure out the usage and
>>> descrepancies.
>>> My first experiments with 'convertsegfilestopdf.exe' are not successfull
>>> in integrating the hOcr into the PDF.
>>> I did only succed to produce a standalone PDF. 'filefinfo' is definitely
>>> welcome.
>>>
>>> Le vendredi 28 février 2014 01:15:41 UTC+1, Quan Nguyen a écrit :
>>>
>>>> Beginning 3.03, Tesseract includes support for searchable PDF output.
>>>>
>>>> On Thursday, February 27, 2014 8:17:15 AM UTC-6, Bernard Polarski
>>>> wrote:
>>>>>
>>>>>  I cannot find the binaries for hocr2pdf from exact-image for windows
>>>>> (even for cygwin).
>>>>> There are quite a few python scritps but I could not put anyone of
>>>>> them successfully at work.
>>>>> Always missing a library and many of them include parts of exact-image.
>>>>>
>>>>> When it comes to hocr2pdf.net, there is no binary either. it seems to
>>>>> be only a library.
>>>>>
>>>>> Anyone know a tool, still available to transform the hocr output from
>>>>> tesseract into a pdf ?
>>>>>
>>>>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to