*Thank you so much for guiding me. *

*I had read links and sub-links provided and as suggested I will use OCR-D(*
https://github.com/OCR-D/ocrd-train*)  for training *
I want to know what is the *best way to create  pairs of [*.tif, *.gt.txt]  
from tif image for two and more fonts . Is their any specific tool to 
generate line *.tif and *.gt.txt files as required by OCR-D. *
*I have data like below tiff image(Total 20 images), Please guide me *
*Thank you*

<https://lh3.googleusercontent.com/-wdzw32GT4fk/W04iwd71ldI/AAAAAAAAJFA/lx3BfSnCujkKmch4oGRSJLFgkKG1uvuTgCLcBGAs/s1600/SCAN_20180716_145539118.tiff>


On Wednesday, July 4, 2018 at 8:20:54 PM UTC+5:30, Joe wrote:
>
> Hi everybody!
>
> I'm trying this tool https://github.com/OCR-D/ocrd-train/ but without 
> success so far. Tesseract and Leptonica are installed by the scripts.
> Inspired by the test set provided in that repo, I created pairs of [*.tif, 
> *.gt.txt] with binarized chars and TTF's from two fonts (1869 text lines in 
> total).
> You can see an example of my set in attachment that also contains files 
> created by the training process.
>
> My guess is that something is wrong with my data.
> Sometimes I can see the char train value increasing instead of decreasing 
> and the final error rate still too high (about 60%).
>
> That new training process with LSTM is driving me crazy!
> I would appreciate if anyone with experience could take a look to my data 
> set.
>
>
> Joe. 


On Tuesday, July 17, 2018 at 9:04:08 PM UTC+5:30, Lorenzo Blz wrote:
>
>
> Have a look at this thread:
>
> https://groups.google.com/forum/#!topic/tesseract-ocr/be4-rjvY2tQ
>
>
> It's easier than it seems, you do not need per character boxes with 4.0, 
> just one per line (that ocr-d automatically generates). If your text is 
> already split into lines you do not have to do anything more.
>
> Unicharset and lstmf files are also created by ocr-d.
>
>
> Feel free to ask if you get stuck, now I have this working but it's a 
> bumpy road (lot of assertion failed/segmentation fault if you miss 
> something). 
>
>
> Bye
>
> Lorenzo
>
> 2018-07-17 15:03 GMT+02:00 Ramakant Kushwaha <ramakant...@gmail.com 
> <javascript:>>:
>
>> *Hi,*
>>
>> *Recently I trying to retrain Tesseract 4.0 for recognising handwritten 
>> digits. I am following official page but finding it very difficult. It 
>> would be great if someone can elaborate below steps*
>>
>> - Prepare training text. 
>> <https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951>(I
>>  
>> am using jTessBoxEditor for creating box files )
>> - Render text to image + box file. (Or create hand-made box files for 
>> existing image data.)
>> - Make unicharset file. (Can be partially specified, ie created 
>> manually). (Do not how to do this)
>> - Make a starter traineddata from the unicharset and optional dictionary 
>> data. 
>> <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-starter-traineddata>
>> - Run tesseract to process image + box file to make training data set.
>> - Run training on training data set.
>> - Combine data files.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/97e29010-f602-42e9-b3b8-121fb151a49e%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/97e29010-f602-42e9-b3b8-121fb151a49e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/885fce6d-2b81-4bc2-9eee-4dea8df5c263%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to