[image: 1_7wBhusJmIwkiwV-J3LJ7lw.png]
I am not trying to train the whole model in an unsupervised way, I just 
want to train the language model which act as the final layer of tesseract 
to generate variable length sequence, this will act like a *pre-training* 
step. Just like other language models as can be seen in the image we 
provide output of previous timestep as input to the next timestep, similar 
to that i can provide my own sequences so that the network has some 
additional information about my sequences and afterwards it can be tuned in 
a supervised manner by training on supervised data. 

On Monday, 15 October 2018 13:38:03 UTC+5:30, Soumik Ranjan Dasgupta wrote:
>
> No, tesseract cannot be trained in an unsupervised manner, it needs ground 
> truth labels to train from scratch or fine-tune. Please provide a sample 
> image to test if possible.
>
> On Mon, Oct 15, 2018 at 12:38 PM Rahul Tyagi <rahul....@gmail.com 
> <javascript:>> wrote:
>
>> Hi,
>>
>> I am trying to run tesseract-ocr on invoices to detect user ID's, Invoice 
>> numbers, tax codes etc. I think tesseract has not been trained on this kind 
>> of data so i need to fine tune the network on my data. Now it will be a bit 
>> difficult for me to get labelled data to fine tune tesseract as stated in 
>> training-tesseract wiki page. So wanted to know if its possible to only 
>> tune the language model of tesseract-ocr in an unsupervised way just like 
>> the language models trained for English Language Understanding i.e. showing 
>> the language model just the pins and ids by passing the output generated at 
>> previous (t-1) timestep as input to current timestep (t). 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/276e6dcf-f0b5-43e0-a794-d1bb69c88857%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/276e6dcf-f0b5-43e0-a794-d1bb69c88857%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
> Regards,
> Soumik Ranjan Dasgupta
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d134ed76-ee34-4460-b835-0eb784bbca7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to