Hello shree,
Than, what is the way to train the sanskrit along with roman diacritical
and achieve accuracy too or the alternative ways to do achieve this ?

Regards,

On Thu, Nov 5, 2020 at 8:15 PM Shree Devi Kumar <shreesh...@gmail.com>
wrote:

> Legacy engine training won't work for Devanagari. The cube engine which
> was used in tesseract for Hindi has been removed.
>
> If you are only training for English and diacritics it may work for you.
> But note that there are no fine-tuning options for it. You have to train a
> model from scratch.
>
> ,.....
>
> shapetable, tr etc are all files for legacy engine, 3.0x and before.
>
> It is supported in tesseract4 with --oem 0
> On Thu, Nov 5, 2020, 17:14 Shree Devi Kumar <shreesh...@gmail.com> wrote:
>
>> Are you trying to train for the legacy tesseract engine?
>>
>> On Thu, Nov 5, 2020, 16:46 shreyansh dwivedi <advocates...@gmail.com>
>> wrote:
>>
>>> hello shree i am attaching the image file , box file and the train.bash
>>> script in this email along with the error generated while running the
>>> script, FYIP currently i am using windows so run the bash script on msys2
>>> terminal
>>>  font_properties
>>> <https://drive.google.com/file/d/1s8RH4xjLwPjZ_go37F6CF07kT38vqG2s/view?usp=drive_web>
>>>  san_NKP_int.inttemp
>>> <https://drive.google.com/file/d/18Plctl6Ia_dLhE-DMCI6zh0-1fCRy53q/view?usp=drive_web>
>>>  san_NKP_int.normproto
>>> <https://drive.google.com/file/d/1Apbf1nrpXjGYD1-x4XfxFNLSWqMSCasb/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp0.box
>>> <https://drive.google.com/file/d/1V4neOkxouYuoT0p4uSnp3RgqYmx0VfQK/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp0.png
>>> <https://drive.google.com/file/d/1o-XZg3dZSwsFhrJtfuFOHlcpIvJ5ehlM/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp0.tr
>>> <https://drive.google.com/file/d/1rgiQ8tWcYvxYS3MYgSZ19Wi-ulrudl7c/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp1.box
>>> <https://drive.google.com/file/d/1CeTujdd_sFxgxPCj5ojkWc-riE0Jko0U/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp1.png
>>> <https://drive.google.com/file/d/1S-NK7lG40r3aPsN9m8Fhg_JLgAcfZOeD/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp1.tr
>>> <https://drive.google.com/file/d/1MzAaFkFOAGfBsdFVsvpQd9VuD9H9Srn7/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp2.box
>>> <https://drive.google.com/file/d/1l2uVS73hFw6TjyCQeNFkQ8lYf-KBhjO9/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp2.png
>>> <https://drive.google.com/file/d/1ywDR8j0K-ngGvj0WC0LAQYYkG6M64qDS/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp2.tr
>>> <https://drive.google.com/file/d/1pcYoFkJvO0dFaY5OfuEaZwkyI5wjHobd/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp3.box
>>> <https://drive.google.com/file/d/1zn4ZC4ueDryOW_oAslAIHH5di4zYlaWF/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp3.png
>>> <https://drive.google.com/file/d/1j8hecGX9jVAchwpW5VMXCeIl0bvatMKG/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp3.tr
>>> <https://drive.google.com/file/d/1LQJjrQtCRf3vbmPNpiJnwM_x1q0nWYoh/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp4.box
>>> <https://drive.google.com/file/d/1WP3Oa5mxH0YsdM-HUZnBbh-OyEesWZy_/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp4.png
>>> <https://drive.google.com/file/d/1TNkgDppOo3m5XAVb73evWLEFuH-mhtrW/view?usp=drive_web>
>>>  san_NKP_int.ocrb.exp4.tr
>>> <https://drive.google.com/file/d/1hN2ORHCFo47wMw0BrkI77C0bW8ISFCzT/view?usp=drive_web>
>>>  san_NKP_int.pffmtable
>>> <https://drive.google.com/file/d/1aIcJA4B-1yJzj54hcD6n-9eWZYBCCss2/view?usp=drive_web>
>>>  san_NKP_int.shapetable
>>> <https://drive.google.com/file/d/1R4-yD_bMde_KJqGihH3-Uo9nVE6r-SqU/view?usp=drive_web>
>>>  san_NKP_int.traineddata
>>> <https://drive.google.com/file/d/1nvyKsOVLhJs5uP1GcNHIOtqGkIe5Gt87/view?usp=drive_web>
>>>  san_NKP_int.unicharset
>>> <https://drive.google.com/file/d/1BqMN29ZH8lTG9ZwkscmER8XkWQv9EQXm/view?usp=drive_web>
>>>  train.bash
>>> <https://drive.google.com/file/d/1gUhDqGgjJCY5n4fc0ONNL943Qk-M3QeT/view?usp=drive_web>
>>>  unicharset
>>> <https://drive.google.com/file/d/1ZhYZ663FXS2gqegIY2fDG-9IY8-du9Ud/view?usp=drive_web>
>>> below is the error screen shot generated while running the bash script
>>> [image: image.png]
>>> .
>>>
>>> [image: image.png]
>>>
>>>
>>> On Sat, Oct 31, 2020 at 4:20 PM Shree Devi Kumar <shreesh...@gmail.com>
>>> wrote:
>>>
>>>> >ṣ -> it recognises as ş
>>>> I cannot reproduce the issue.  I am getting the following
>>>>
>>>> Line 120: praise of Viṣṇu. Lz. 1388.
>>>> Line 147: lakṣmī XXXIX. 51.
>>>>
>>>> Complete output is attached. It uses
>>>> https://github.com/Shreeshrii/tess5training-sanskrit-iast/blob/main/tessdata/fast/Sanskrit-1017-fast.traineddata
>>>>
>>>> Hello Shree,
>>>> I have a image comprising of sanskrit text  and Romal Text comprising
>>>> of diacritical a, ā, ś, Ś, ṛ, ṇ, ṃ, ū, ī, ṭ, ṅ, ḍ, ṛ, ṣ. I am using the
>>>> sanskrit_int.tarinedata created by you, it recognises sanskrit text quite
>>>> good for properly scanned images but for the diacritical part only a few
>>>> characters could be identified namely ā, ū, but for
>>>> ṣ -> it recognises as ş
>>>>
>>>> right now i am using QTBoxEditor to correct the wrongly recognised
>>>> characters like the one above.
>>>>
>>>> I want to ask while training for the new language model some rules are
>>>> defined and one of them is the naming convention od image, here in this i
>>>> want to ask what is the font type and how to identify which font name is
>>>> used in the image :
>>>> [language name].[font name].exp[number].[file extension]
>>>>
>>>> how to identify what should bethe font name for the image
>>>> for better understanding i am attaching the image file.
>>>>
>>>> On Mon, Oct 19, 2020 at 4:45 PM Shree Devi Kumar <shreesh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Please share the groundtruth for the test images also.
>>>>>
>>>>> Yes, you can certainly try to train on basis of these models.
>>>>>
>>>>>
>>>>> On Mon, Oct 19, 2020, 15:51 shreyansh dwivedi <advocates...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>   Hello Shree,
>>>>>> Subh navratri,
>>>>>> I used the trained model build by you but unfortunately they are not
>>>>>> giving results, please refer to the picture and the text inscribed in it,
>>>>>> what if we may build the model on the basis of it. PFA.
>>>>>>
>>>>>> Regards,
>>>>>> Shreyansh Dwivedi
>>>>>>
>>>>>> ---------- Forwarded message ---------
>>>>>> From: Shree Devi Kumar <shreesh...@gmail.com>
>>>>>> Date: Thu, Oct 8, 2020 at 6:18 PM
>>>>>> Subject: Re: [tesseract-ocr] Diacriticals Training
>>>>>> To: tesseract-ocr <tesseract-ocr@googlegroups.com>
>>>>>>
>>>>>>
>>>>>> I have uploaded the results of various trainings for IAST (with
>>>>>> diacritics) and Devanagari for Sanskrit at
>>>>>> https://github.com/Shreeshrii/tess5training-sanskrit-iast/tree/main/tessdata/best
>>>>>> . The traineddata files and the corresponding lstm-unicharset has been
>>>>>> uploaded there.
>>>>>>
>>>>>> The training has been done mostly with line images of synthetic
>>>>>> training data in various fonts. On evaluation datasets of synthetic
>>>>>> training data, not seen during training, I get a CER of 2-3%. I am 
>>>>>> curious
>>>>>> to know how well these perform with real life images.
>>>>>>
>>>>>> I will appreciate if those who are testing can send me a few of their
>>>>>> test images along with the ground truth text.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>>>>>  Virus-free.
>>>>>> www.avg.com
>>>>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>>>>> <#m_3390908968527288306_m_-3921426355472222782_m_2388715278102219081_m_-5034749088946031926_m_-518494527659819167_m_1074673088079480863_m_-8626291968419235944_m_1597521380095537522_m_1988198995350034268_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>>>>
>>>>>> On Mon, Sep 28, 2020 at 12:19 PM Shree Devi Kumar <
>>>>>> shreesh...@gmail.com> wrote:
>>>>>>
>>>>>>> I am currently running a training run based on synthetic training
>>>>>>> data for Sanskrit to support both Devanagari script with vedic accents 
>>>>>>> as
>>>>>>> well as iAST (Roman with diacritics support). I will share the 
>>>>>>> traineddata
>>>>>>> for you and others who are interested to test how well it works with 
>>>>>>> real
>>>>>>> life images.
>>>>>>>
>>>>>>> On Mon, Sep 28, 2020, 10:43 shreyansh dwivedi <
>>>>>>> advocates...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>> I want to train some diacritical which are not present in
>>>>>>>> latin.trained model, apart from latin i used vietnamese and latvian 
>>>>>>>> trained
>>>>>>>> model but the some of the diacriticals are missed in those models too, 
>>>>>>>> some
>>>>>>>> of missed characters are mentioned below which i need to recognise.
>>>>>>>> ṭ
>>>>>>>> Ṭ
>>>>>>>> ṅ
>>>>>>>> ṭh
>>>>>>>> ḍ
>>>>>>>> ḍh
>>>>>>>> ṇ
>>>>>>>> ṃ
>>>>>>>> ṣ
>>>>>>>> Ḥ
>>>>>>>> ḥ
>>>>>>>> I want to train the above diacritical to recognise the characters
>>>>>>>> in the text image, through the tesseract engine.
>>>>>>>> Any help would be appreciated and from the scratch would be a great
>>>>>>>> way to understand.
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAMREWd6R%2Bec5r%3D77%2BRWGM7PUKZPqqJT%2BkNX6r9zwijvW5sxykQ%40mail.gmail.com
>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAMREWd6R%2Bec5r%3D77%2BRWGM7PUKZPqqJT%2BkNX6r9zwijvW5sxykQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> ____________________________________________________________
>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWRgU8JFRm2RP3ndzrsVVeS%3DFF%2BDg8w3LTkjR_kv9eU7g%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWRgU8JFRm2RP3ndzrsVVeS%3DFF%2BDg8w3LTkjR_kv9eU7g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUFM%3D%3DW%2BpybX69BpLgvEWa5a%3DjG5X4sMEk4T0C98P5sYA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUFM%3D%3DW%2BpybX69BpLgvEWa5a%3DjG5X4sMEk4T0C98P5sYA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAMREWd7c14tPPHB2xqJf1FvCgEep_pr6CMYLhuSoFT9GNsqvtA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAMREWd7c14tPPHB2xqJf1FvCgEep_pr6CMYLhuSoFT9GNsqvtA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUJu%2B4fRB2vL0T_%3D6CMT4CZ%3DRccGRw24Pnc84QcTxtDLQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUJu%2B4fRB2vL0T_%3D6CMT4CZ%3DRccGRw24Pnc84QcTxtDLQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMREWd45DEt_y5EcXLQR0_gecJdEPJY1fNyGkmMDugYnGCDG%2BQ%40mail.gmail.com.

Reply via email to