I did not run QBE on windows for a long time.
Try this (QBE+depends)[1] - I run it on win7 pro 64bit (even app&libs are
32bit, build with mingw 4.8, leptonica 1.70 a tesseract 3.03rc1)

[1] http://www.sk-spell.sk.cx/tmp/qtb-1.11.1.ZIP

Zdenko


On Mon, Mar 10, 2014 at 7:21 AM, Bernard Polarski <[email protected]>wrote:

> I downloaded QBE and the additionals liraries, but it does not start on my
> Windows Seven. Just get the message that the application ceased to function
> and windows has to close it.
>
>
> Le dimanche 9 mars 2014 21:19:23 UTC+1, zdenop a écrit :
>>
>> If I understood you correctly - You would like to have something like
>> this:
>>
>> tesseract lm-110.jpg lm-110 -l fra makebox
>>
>>
>> that creates box file and then some tool that will replace symbol(text)
>> part of box file with content of e.g. lm-110.txt (certified text)? I did
>> this with QBE[1]. But there are some (QBE) limitations:
>>
>>    - there must be one symbol per box
>>    - number of boxes must be the same as count of symbols in your text
>>    file (without spaces)
>>
>>  So my workflow was something like this:
>>
>>    1. create box file (or open image in QBE - it will offer you to
>>    create box file)
>>    2. remove unnecessary boxes (heading, footer, page numbers, scan
>>    relics...)
>>    3. split multisymbol boxes (e.g in one box file there was more
>>    symbols)
>>    4. import text from external file (QBE->File->Import...->Import text
>>    file)
>>
>> It still needs user interaction (no automatic), but it can help, if you
>> need something like that.
>>
>> [1] https://github.com/zdenop/qt-box-editor
>>
>> Zdenko
>>
>>
>> On Sat, Mar 8, 2014 at 7:47 PM, Bernard Polarski <[email protected]>wrote:
>>
>>>  Let me summarize what I am doing and what I am trying to achieve.
>>>
>>> Tesseract is excellent when it comes to recognize binaries fonts
>>> (fonts that comes from computer, printed or directly generated from an
>>> application).
>>>
>>> The match is a near perfect and many times it is perfect.
>>> And it is easy now to train a text for one zillion fonts when it comes
>>> to binaries font:
>>>
>>>    text2image --text=$FIN  --outputbase=$FOUT  --fonts_dir=$FONT_DIR
>>> --render_per_font --find_fonts
>>>
>>> This will generates one zillion fonts. This is a big plus from version
>>> 3.03. But honestly this job has been done at Google.
>>>
>>> But training out of binaries fonts are deceiving when they are applied
>>> on printed fonts, specially for books from the 19e century.
>>> I belong to a group that edit epub for books of 19e century.
>>> That kind of books comes in collections, and the collections were often
>>> printed on the same machine.
>>>
>>> So instead of creating a library of 'Century old school' font, I am
>>> exploring the idea of creating a font dedicated to an editor for a
>>> given period.
>>> ie *'*EFlammarion1870.ttf' to be used on these books.
>>>
>>> I do have enough plenty scripts to automatically generates a traineddata
>>> file, starting from a directory containing img.tif file and their img.box.
>>> But it is very time consuming to generate every one of these box file.
>>>
>>> The idea is to start from a set of scanned image, grabs a certified text
>>> from site like Gutenberg ( for french ebooksgratuits.com provides more
>>> books).
>>> A search string on the first 3 words in the certified text and here is
>>> the needed certified translation.
>>>
>>> So I am looking now looking for a method to transform the certified text
>>> into box file.
>>>  Doing this for some pages in order to generates quickly a new
>>> traineddata and test it.
>>> In this respect, it is clear that JTessBoxEditor, which is very good
>>> but the process
>>> to generate the box file is too slow and not prone to errors.
>>>
>>>
>>> Here is a page extracted from "La maison nucingen" whose print is quite
>>>> bad, so it is interresting.
>>>>
>>>
>>>
>>>> http://gallica.bnf.fr/ark:/12148/bpt6k58135211/f107.
>>>> image.r=la%20maison%20nucingen.langEN
>>>>
>>>
>>>
>>>
>>> <https://lh4.googleusercontent.com/-7xPLX_2HR54/UxtWUEx8nBI/AAAAAAAAAB4/ro0vwKP0Oh4/s1600/lm-110.tif>
>>>
>>>
>>> The text :
>>> proposait d’opérer avec ses millions faits d’une
>>> main de papier rose à l’aide d’une pierre litho-
>>> graphique, de jolies petites actions à placer, pré-
>>> cieusement conservées dans son cabinet. Les ac-
>>> tions réelles allaient servir à fonder l’affaire,
>>> acheter un magnifique hôtel et commencer les
>>> opérations. Nucingen se trouvait encore des ac-
>>> tions dans je ne sais quelles mines de plomb ar-
>>> gentifère, dans des mines de houille et dans deux
>>> canaux, actions bénéficiaires accordées pour la
>>> mise en scène de ces quatre entreprises en pleine
>>> activité, supérieurement montées et en faveur, au
>>> moyen du dividende pris sur le capital. Nucin-
>>> gen pouvait compter sur un agio si les actions
>>> montaient, mais le baron le négligea dans ses
>>> calculs, il le laissait à fleur d’eau, sur la place,
>>> afin d’attirer les poissons ! Il avait donc massé
>>> ses valeurs, comme Napoléon massait ses trou-
>>> piers, afin de liquider durant la crise qui se des-
>>> sinait et qui révolutionna, en 26 et 27 les places
>>> européennes. S’il avait eu son prince de Wagram,
>>> il aurait pu dire comme Napoléon du haut du
>>> Santon : « Examinez bien la place, tel jour, à telle
>>> heure, il y aura là des fonds répandus ! » Mais à
>>> qui pouvait-il se confier ? Du Tillet ne soupçonna
>>>
>>>
>>>
>>>
>>> --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>>
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>>
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to