No, tesseract cannot be trained in an unsupervised manner, it needs ground truth labels to train from scratch or fine-tune. Please provide a sample image to test if possible.
On Mon, Oct 15, 2018 at 12:38 PM Rahul Tyagi <rahul.iss...@gmail.com> wrote: > Hi, > > I am trying to run tesseract-ocr on invoices to detect user ID's, Invoice > numbers, tax codes etc. I think tesseract has not been trained on this kind > of data so i need to fine tune the network on my data. Now it will be a bit > difficult for me to get labelled data to fine tune tesseract as stated in > training-tesseract wiki page. So wanted to know if its possible to only > tune the language model of tesseract-ocr in an unsupervised way just like > the language models trained for English Language Understanding i.e. showing > the language model just the pins and ids by passing the output generated at > previous (t-1) timestep as input to current timestep (t). > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/276e6dcf-f0b5-43e0-a794-d1bb69c88857%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/276e6dcf-f0b5-43e0-a794-d1bb69c88857%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Regards, Soumik Ranjan Dasgupta -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAB_aDAecXzyxbkGtcWKSD_bt_ewd%3Da9Dn4ZKQiQtAxXOGoVmBg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.