[image: 1_7wBhusJmIwkiwV-J3LJ7lw.png] I am not trying to train the whole model in an unsupervised way, I just want to train the language model which act as the final layer of tesseract to generate variable length sequence, this will act like a *pre-training* step. Just like other language models as can be seen in the image we provide output of previous timestep as input to the next timestep, similar to that i can provide my own sequences so that the network has some additional information about my sequences and afterwards it can be tuned in a supervised manner by training on supervised data.
On Monday, 15 October 2018 13:38:03 UTC+5:30, Soumik Ranjan Dasgupta wrote: > > No, tesseract cannot be trained in an unsupervised manner, it needs ground > truth labels to train from scratch or fine-tune. Please provide a sample > image to test if possible. > > On Mon, Oct 15, 2018 at 12:38 PM Rahul Tyagi <rahul....@gmail.com > <javascript:>> wrote: > >> Hi, >> >> I am trying to run tesseract-ocr on invoices to detect user ID's, Invoice >> numbers, tax codes etc. I think tesseract has not been trained on this kind >> of data so i need to fine tune the network on my data. Now it will be a bit >> difficult for me to get labelled data to fine tune tesseract as stated in >> training-tesseract wiki page. So wanted to know if its possible to only >> tune the language model of tesseract-ocr in an unsupervised way just like >> the language models trained for English Language Understanding i.e. showing >> the language model just the pins and ids by passing the output generated at >> previous (t-1) timestep as input to current timestep (t). >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/276e6dcf-f0b5-43e0-a794-d1bb69c88857%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/276e6dcf-f0b5-43e0-a794-d1bb69c88857%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > Regards, > Soumik Ranjan Dasgupta > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d134ed76-ee34-4460-b835-0eb784bbca7d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.