Please open this as an issue in github repo -
https://github.com/tesseract-ocr/tesseract/issues

>  the "/" is added without taking care if the command is used on Windows
or Linux.

Found a couple of places in that file where this is the case.

    // Load the unicharset for the script if available.
    string filename = script_dir + "/" +
                      unicharset->get_script_from_script_id(s) +
".unicharset";

​and

    // Load the xheights for the script if available.
    string filename = script_dir + "/" +
                      unicharset.get_script_from_script_id(s) + ".xheights";
​


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Feb 23, 2018 at 2:25 PM, Jehan <jehanpoub...@gmail.com> wrote:

> I'm training Tesseract on Windows for a new font and everything went
> pretty well until the set_unicharset_properties command step:
>
> set_unicharset_properties -U .\unicharset -O .\unicharset2 -F
> "C:\Windows\Fonts\Roman.tff" --script_dir='C:\Program Files
> (x86)\Tesseract-OCR\training'
>
> Loaded unicharset of size 7 from file .\unicharset
>> Setting unichar properties
>> Other case c of C is not in unicharset
>> Other case f of F is not in unicharset
>> Setting script properties
>> Failed to load script unicharset from:C:\Program Files
>> (x86)\Tesseract-OCR\training/Latin.unicharset
>> Warning: properties incomplete for index 3 = C
>> Warning: properties incomplete for index 4 = 0
>> Warning: properties incomplete for index 5 = 1
>> Warning: properties incomplete for index 6 = F
>> Writing unicharset to file .\unicharset2
>
>
> I've verified that Latin.unicharset is in the right directory.
>
> The problem (I'm pretty sure) is on the end of this line :
>
> Failed to load script unicharset from:C:\Program Files
>> (x86)\Tesseract-OCR\training/Latin.unicharset
>>
>
> The thing is that the training software adds a "/" instead of a "\".
> I've looked on unicharset_training_utils.cpp, in the line 166, the "/" is
> added without taking care if the command is used on Windows or Linux.
>
> Is there a solution for Windows to load Latin.unicharset even with this
> "/" ?
> If not, what is the easiest solution ?
>
> For information, my unicharset2 file looks like that :
>
>> 7
>> NULL 0 Common 0
>> Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e
>> 65 64 ]a
>> |Broken|0|1 f 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken
>> C 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 C # C [43 ]A
>> 0 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 0 # 0 [30 ]0
>> ...
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/aa3a131c-51fe-42ea-9fba-336ef89737cd%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/aa3a131c-51fe-42ea-9fba-336ef89737cd%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXfoVoMftjJcWOt2Nsts_%3DvKxPj4BAT8zWnNKdjZOPiKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to