Please open this as an issue in github repo - https://github.com/tesseract-ocr/tesseract/issues
> the "/" is added without taking care if the command is used on Windows or Linux. Found a couple of places in that file where this is the case. // Load the unicharset for the script if available. string filename = script_dir + "/" + unicharset->get_script_from_script_id(s) + ".unicharset"; and // Load the xheights for the script if available. string filename = script_dir + "/" + unicharset.get_script_from_script_id(s) + ".xheights"; ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Feb 23, 2018 at 2:25 PM, Jehan <jehanpoub...@gmail.com> wrote: > I'm training Tesseract on Windows for a new font and everything went > pretty well until the set_unicharset_properties command step: > > set_unicharset_properties -U .\unicharset -O .\unicharset2 -F > "C:\Windows\Fonts\Roman.tff" --script_dir='C:\Program Files > (x86)\Tesseract-OCR\training' > > Loaded unicharset of size 7 from file .\unicharset >> Setting unichar properties >> Other case c of C is not in unicharset >> Other case f of F is not in unicharset >> Setting script properties >> Failed to load script unicharset from:C:\Program Files >> (x86)\Tesseract-OCR\training/Latin.unicharset >> Warning: properties incomplete for index 3 = C >> Warning: properties incomplete for index 4 = 0 >> Warning: properties incomplete for index 5 = 1 >> Warning: properties incomplete for index 6 = F >> Writing unicharset to file .\unicharset2 > > > I've verified that Latin.unicharset is in the right directory. > > The problem (I'm pretty sure) is on the end of this line : > > Failed to load script unicharset from:C:\Program Files >> (x86)\Tesseract-OCR\training/Latin.unicharset >> > > The thing is that the training software adds a "/" instead of a "\". > I've looked on unicharset_training_utils.cpp, in the line 166, the "/" is > added without taking care if the command is used on Windows or Linux. > > Is there a solution for Windows to load Latin.unicharset even with this > "/" ? > If not, what is the easiest solution ? > > For information, my unicharset2 file looks like that : > >> 7 >> NULL 0 Common 0 >> Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e >> 65 64 ]a >> |Broken|0|1 f 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken >> C 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 C # C [43 ]A >> 0 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 0 # 0 [30 ]0 >> ... > > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/aa3a131c-51fe-42ea-9fba-336ef89737cd% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/aa3a131c-51fe-42ea-9fba-336ef89737cd%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXfoVoMftjJcWOt2Nsts_%3DvKxPj4BAT8zWnNKdjZOPiKg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.