Hi there, 

Any news about training Tesseract 4.0?
I'm trying to train my fonts for few days... but still can't go anywhere. 
I'm looking for compact manual for training from txt/ttf files for new 
fonts in various languages. 

Will be appreciate for any help.

On Tuesday, February 21, 2017 at 8:06:40 AM UTC+1, timothylegg wrote:
>
> I found this thread to be interesting since I tried training Tesseract a 
> few years ago and gave up.  Has anybody considered writing any 
> documentation on this something that is best explained whenever a user 
> can't figure it out from trial/error?  I'm open to maybe writing about this 
> if there is a need for it, but first, I will have to understand it better 
> myself.
>
>
> On Thursday, February 9, 2017 at 4:08:13 AM UTC-6, Kay-Michael Würzner 
> wrote:
>>
>> Thanks also from my side. I'll have a look into the jTessBoxEditor beta, 
>> try to setup training and get back to you.
>>
>> Kay
>>
>> On Wednesday, February 8, 2017 at 3:52:58 PM UTC+1, shree wrote:
>>>
>>> Thanks, Quan
>>>
>>> - excuse the brevity, sent from mobile
>>>
>>> On 08-Feb-2017 7:33 PM, "Quan Nguyen" <nguy...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Tuesday, February 7, 2017 at 9:34:11 AM UTC-6, shree wrote:
>>>>>
>>>>> ​For LSTM training, box files need to have an additional line for each 
>>>>> text line with the tab character to indicate a new line.
>>>>>
>>>>> If you have existing box/tiff pairs, you can use a box editor (such as 
>>>>> jtessboxeditor) and insert a box at end of each line and add a tab 
>>>>> character in it.
>>>>>
>>>>
>>>> The jTessBoxEditor beta version has a new Mark EOL function that does 
>>>> just that.
>>>>  
>>>>
>>>>>
>>>>> >On the toolbar, the Character textbox has a built-in conversion 
>>>>> function. If you enter U+0009 and hit Enter key or click on the adjacent 
>>>>> Tool icon, the escape sequences will be converted to Unicode. You can 
>>>>> also 
>>>>> enter the tab character via Alt+09 numpad keys on Windows.
>>>>>
>>>>> o
>>>>> ​r add a dummy sequence such as @@@ and then replace to tab character 
>>>>> in a text editor.
>>>>> ​
>>>>> ​See attached files as a sample.
>>>>>
>>>>> Then modify tesstrain.sh to copy the box tiff pairs to the training 
>>>>> directory before starting training
>>>>>
>>>>>
>>>>>
>>>>> mkdir -p ${TRAINING_DIR}
>>>>> tlog "\n=== Starting training for language '${LANG_CODE}'"
>>>>>
>>>>> cp  ./*.box "${TRAINING_DIR}/"
>>>>> cp  ./*.tif "${TRAINING_DIR}/"​
>>>>>
>>>>>
>>>>> On Tue, Feb 7, 2017 at 8:27 PM, Kay-Michael Würzner <wuer...@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> +1 for this question. The training documentation for Tesseract 4.0 by 
>>>>>> now only covers training with font files (synthetic materials). What is 
>>>>>> missing is information on training with real data (i.e. manually aligned 
>>>>>> ground truth).
>>>>>> Any hints on that matter are greatly appreciated.
>>>>>>
>>>>>> Cheers,
>>>>>> Kay
>>>>>>
>>>>>> On Wednesday, January 18, 2017 at 12:31:54 AM UTC+1, 
>>>>>> chen...@huawei.com wrote:
>>>>>>>
>>>>>>> I have a bunch of images, containing English words.
>>>>>>> I would like to generate training data by these images, and do the 
>>>>>>> training.
>>>>>>> How should I do?
>>>>>>>
>>>>>>> Thanks a lot.
>>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/ab8bc158-95b1-4c08-bc99-76a7442a919d%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/ab8bc158-95b1-4c08-bc99-76a7442a919d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0c10bdeb-964b-4a0f-bf0d-7e22ad6111cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to