-> Using standard tess api to recognize text:
``` image1 = imread("/home/user/Desktop/src.png"); cv::cvtColor(image1, image1, COLOR_RGB2GRAY); cv::threshold(image1, image1, 125, 255, THRESH_BINARY); tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI(); if (api->Init(NULL, "eng")) { fprintf(stderr, "Could not initialize tesseract.\n"); exit(1); }; api4->SetImage((uchar*)image1.data, image1.size().width, image1.size().height, image1.channels(), image1.step1()); char *outText = api->GetUTF8Text(); cout << "outText:" << outText << endl; ``` -> Need to train tesseract to recognize more precisely some symbols -> Using the guide below: jTessBox Editor: https://sourceforge.net/projects/vietocr/files/jTessBoxEditor/ > > Step 1: Make box files for images that we want to train > Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox > Eg:tesseract train.my.exp0.tif train.my.exp0 batch.nochop makebox > > {*Note: After making box files we have to change or modify wrongly identified characters in box files.} > > Step 2: Create .tr file (Compounding image file and box file) > Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train > Eg: tesseract train.my.exp.tif train.my.exp0 box.train > > step 3: Extract the charset from the box files (Output for this command is unicharset file) > Syntax: unicharset_extractor [langname].[fontname].[expN].box > Eg: unicharset_extractor train.my.exp0.box > > step 4: Create a font_properties file based on our needs. > Syntax: echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" [angle bracket should be here] font_properties > Eg: echo "arial 0 0 1 0 0" [angled bracket] font_properties > > Step 5: Training the data. > Syntax: mftraining -F font_properties -U unicharset -O [langname].unicharset [langname].[fontname].[expN].tr > Eg: mftraining -F font_properties -U unicharset -O train.unicharset train.my.exp0.tr > > Step 6: > Syntax: cntraining [langname].[fontname].[expN].tr > Eg: cntraining train.my.exp0.tr > {*Note:After step 5 and step 6 four files were created.(shapetable,inttemp,pffmtable,normproto) } > > Step 7: Rename four files (shapetable,inttemp,pffmtable,normproto) into ([langname].shapetable,[langname].inttemp,[langname].pffmtable,[langname].normproto) > Syntax: rename filename1 filename2 > Eg: > rename shapetable train.shapetable > rename inttemp train.inttemp > rename pffmtable train.pffmtable > rename normproto train.normproto > > Step 8: Create .traineddata file > Syntax: combine_tessdata [langname]. > Eg: combine_tessdata train. > > Move .traineddata file to tesseract programs tessdata directory > C:\Program Files\Tesseract-OCR\tessdata > > Run tesseract for trained fronts > > tesseract Test2.png stdout -l train -> I'm confused with font name, as I dont know the font name... > step 4: Create a font_properties file based on our needs. > Syntax: echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" [angle bracket should be here] font_properties > Eg: echo "arial 0 0 1 0 0" [angled bracket] font_properties -> How to get font name set for TessBaseAPI read with c+ and command line? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1623f0bb-0ab0-4571-839b-782ddb18caf8n%40googlegroups.com.