ModuleNotFoundError: No module named 'bidi

Install python-bidi

On Thu, Jan 7, 2021, 15:45 Soumik Ranjan Dasgupta <ranjansou...@gmail.com>
wrote:

> Hi Shreeshrii,
>
> I took your command exactly as it is and ran it (made sure the
> tessdata_best directory is present in $HOME
>  with best ben.traineddata) and ran into an extremely weird error.
> Here is the log:
>
> find data/ben-ground-truth -name '*.gt.txt' | xargs cat | sort | uniq >
> "data/ben/all-gt"
> combine_tessdata -u /root/tessdata_best/ben.traineddata  data/ben/ben
> Version
> string:4.00.00alpha:ben:synth20170629:[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx64Lrx64Lfx512O1c1]
> 0:config:size=377, offset=192
> 17:lstm:size=10605707, offset=569
> 18:lstm-punc-dawg:size=3154, offset=10606276
> 19:lstm-word-dawg:size=427618, offset=10609430
> 20:lstm-number-dawg:size=426, offset=11037048
> 21:lstm-unicharset:size=6866, offset=11037474
> 22:lstm-recoder:size=1003, offset=11044340
> 23:version:size=80, offset=11045343
> Extracting tessdata components from /root/tessdata_best/ben.traineddata
> Wrote data/ben/ben.config
> Wrote data/ben/ben.lstm
> Wrote data/ben/ben.lstm-punc-dawg
> Wrote data/ben/ben.lstm-word-dawg
> Wrote data/ben/ben.lstm-number-dawg
> Wrote data/ben/ben.lstm-unicharset
> Wrote data/ben/ben.lstm-recoder
> Wrote data/ben/ben.version
> unicharset_extractor --output_unicharset "data/ben/my.unicharset"
> --norm_mode 2 "data/ben/all-gt"
> Bad box coordinates in boxfile string!  কি জানি কেন প্রদ্যুম্নের বার বার
> মনে আসছিল সেই জীর্ণ পরিচ্ছদপরা
> Extracting unicharset from plain text file data/ben/all-gt
> Wrote unicharset file data/ben/my.unicharset
> merge_unicharsets data/ben/ben.lstm-unicharset data/ben/my.unicharset
> "data/ben/unicharset"
> Loaded unicharset of size 111 from file data/ben/ben.lstm-unicharset
> Loaded unicharset of size 76 from file data/ben/my.unicharset
> Wrote unicharset file data/ben/unicharset.
> PYTHONIOENCODING=utf-8 python3 generate_wordstr_box.py -i
> "data/ben-ground-truth/24-022.tif" -t "data/ben-ground-truth/24-022.gt.txt"
> > "data/ben-ground-truth/24-022.box"
> Traceback (most recent call last):
>   File "generate_wordstr_box.py", line 7, in <module>
>     import bidi.algorithm
> ModuleNotFoundError: No module named 'bidi'
> Makefile:207: recipe for target 'data/ben-ground-truth/24-022.box' failed
> make: *** [data/ben-ground-truth/24-022.box] Error 1
>
> I should mention I double checked the 24-022.gt.txt and 24-022.tif files
> and both of them are valid. Any reason why this might be happening? How can
> I fix this?
> On Saturday, January 2, 2021 at 11:01:27 AM UTC+5:30 shree wrote:
>
>> Soumik,
>>
>> I have uploaded the bash scripts and the generated reports and graphs to
>> `ben` branch in my fork of tesstrain repo. See
>>
>> https://github.com/Shreeshrii/tesstrain/tree/ben
>> and
>>
>> https://github.com/Shreeshrii/tesstrain/commit/a6474ef2dbbac47803d13b6f92fdcf8c9dc3107b
>>
>> Results for the validation data (not seen by lstmtraining either for
>> training or eval, shows an improvement over both ben and script/Bengali.
>>
>> To improve results further, check groundtruth transcription for any
>> missing words, normalize the text and try with some more training data.
>>
>>
>> On Fri, Jan 1, 2021 at 6:41 PM Shree Devi Kumar <shree...@gmail.com>
>> wrote:
>>
>>>
>>> nohup make MODEL_NAME=ben START_MODEL=ben LANG_TYPE=Indic
>>>  GROUND_TRUTH_DIR=data/ben-ground-truth TESSDATA=$HOME/tessdata_best
>>> DEBUG_INTERVAL=-1 training MAX_ITERATIONS=50000 >> data/ben.log &
>>>
>>> Graphs are created using the training log file as well as validation log
>>> files. Some of these require using PRs which have not yet been merged in
>>> tesstrain repo.
>>>
>>> See
>>> https://github.com/tesseract-ocr/tesstrain/pulls
>>>
>>> For Evaluation reports, I used
>>> https://github.com/eddieantonio/ocreval
>>>
>>>
>>>
>>> On Fri, Jan 1, 2021 at 12:09 PM Soumik Ranjan Dasgupta <
>>> ranjan...@gmail.com> wrote:
>>>
>>>> Hi Shreeshrii,
>>>>
>>>> Can you please tell me the training command  used? Also, how can I
>>>> create the graphs and these other documents?
>>>>
>>>> On Sat, 26 Dec 2020, 18:37 Shree Devi Kumar, <shree...@gmail.com>
>>>> wrote:
>>>>
>>>>> Soumik,
>>>>>
>>>>> I used your groundtruth and trained using ben as the START_MODEL.  I
>>>>> got best results on the validation set of images at around 5000 
>>>>> iterations.
>>>>> see attached Accuracy report and CER graph.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 24, 2020 at 8:36 PM Soumik Ranjan Dasgupta <
>>>>> ranjan...@gmail.com> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>> I wanted to do fine-tune the ben.traineddata model by using some
>>>>>> ancient text that were supposedly printed with typeset. I have roughly
>>>>>> around 1k lines of text and tried the normal fine-tuning approach with
>>>>>> around 25k iterations.
>>>>>> The thing that surprised me the most was even after packing the
>>>>>> traineddata (character error was around 4%) and testing an unseen image,
>>>>>> the performance was exactly the same. Not a single character was 
>>>>>> different!
>>>>>> You can find the traineddata, training data, the logs and the source
>>>>>> code at this link:
>>>>>> https://github.com/srdg/unarchived_ben_tess/releases/tag/v0.0.4-alpha
>>>>>>
>>>>>> Can anyone tell me exactly what I am doing wrong here? Do I need to
>>>>>> change any training parameter, increase my training data, or anything 
>>>>>> else
>>>>>> completely?
>>>>>>
>>>>>> Best regards,
>>>>>> Soumik
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/1fc044d1-b0ae-45d5-9041-e6fbf8ec5089n%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1fc044d1-b0ae-45d5-9041-e6fbf8ec5089n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVZ3A7CUEqw29Gxu6r1-cLHPTLFt%3D%3D0C0109D_6x6C7Kw%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVZ3A7CUEqw29Gxu6r1-cLHPTLFt%3D%3D0C0109D_6x6C7Kw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAM-%2BFN%3DZggnH4wV5vUhY9nsSqjKg9xZ5TQDoCMwSqf7H0oPogQ%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAM-%2BFN%3DZggnH4wV5vUhY9nsSqjKg9xZ5TQDoCMwSqf7H0oPogQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/9e188ca3-e477-4ce4-aaad-5c83d2fb5152n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/9e188ca3-e477-4ce4-aaad-5c83d2fb5152n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWkU1CHbknyUWk2wG2Q7s_de_bEtUj3SWFZGnqFzdHQjg%40mail.gmail.com.

Reply via email to