Re: [tesseract-ocr] Tesseract Performance

2021-01-07 Thread Soumik Ranjan Dasgupta
ot yet been merged in >> tesstrain repo. >> >> See >> https://github.com/tesseract-ocr/tesstrain/pulls >> >> For Evaluation reports, I used >> https://github.com/eddieantonio/ocreval >> >> >> >> On Fri, Jan 1, 2021 at 12:09 PM Soumik R

Re: [tesseract-ocr] Tesseract Performance

2020-12-31 Thread Soumik Ranjan Dasgupta
sults on the validation set of images at around 5000 iterations. see > attached Accuracy report and CER graph. > > > > On Thu, Dec 24, 2020 at 8:36 PM Soumik Ranjan Dasgupta < > ranjansou...@gmail.com> wrote: > >> Hi everyone, >> I wanted to do fine-tune the ben.t

[tesseract-ocr] Tesseract Performance

2020-12-24 Thread Soumik Ranjan Dasgupta
Hi everyone, I wanted to do fine-tune the ben.traineddata model by using some ancient text that were supposedly printed with typeset. I have roughly around 1k lines of text and tried the normal fine-tuning approach with around 25k iterations. The thing that surprised me the most was even after

Re: [tesseract-ocr] How to train tesseract with new script?

2019-04-05 Thread Soumik Ranjan Dasgupta
If you have a font of the said script alphabet, yes, I think it is possible. On Thu, Apr 4, 2019, 11:01 PM Moni wrote: > Hi all > I am planning to train the ancient scripts for language translation. Is > there any alternate rather than amazon mechanical turk to train the > character? in stroke f

Re: [tesseract-ocr] Android app using Tesseract v4 for OCR

2019-04-04 Thread Soumik Ranjan Dasgupta
.google.com/store/apps/details?id=com.renard.ocr&hl=en > > > /René > > > On Thu, 4 Apr 2019 at 14:41, Soumik Ranjan Dasgupta < > srd1...@cse.jgec.ac.in> wrote: > >> Of course, I'm sorry I couldn't place it properly. >> What I am aiming t

Re: [tesseract-ocr] Android app using Tesseract v4 for OCR

2019-04-04 Thread Soumik Ranjan Dasgupta
s orthogonal to me. Can you elaborate? > > > /René > > > On Wed, 3 Apr 2019 at 21:41, Soumik Ranjan Dasgupta < > srd1...@cse.jgec.ac.in> wrote: > >> The reason is that I, and as I see, a few other people would like to be >> able to use Tesseract cross-platfo

[tesseract-ocr] WEBAPP using TESSERACT : traineddata files keep getting deleted

2019-04-03 Thread Soumik Ranjan Dasgupta
lp and think it would be a good thing to have a webapp for this, please give it an upvote so the question gets boosted ( that is how StackOverFlow works, I think ). -- Regards, Soumik Ranjan Dasgupta -- You received this message because you are subscribed to the Google Groups "tesseract-ocr&q

Re: [tesseract-ocr] Android app using Tesseract v4 for OCR

2019-04-03 Thread Soumik Ranjan Dasgupta
t; > René Hansen > > > > On Sun, 31 Mar 2019 at 20:25 Greg Dunkel wrote: > >> Please post to list. I am not the only one who would be interested in >> such an app. >> >> On Sun, Mar 31, 2019, 10:34 AM Soumik Ranjan Dasgupta < >> srd1...@cse.jg

Re: [tesseract-ocr] Re: Tesseract 4 Training Tutorials

2019-04-02 Thread Soumik Ranjan Dasgupta
ups.google.com/d/msgid/tesseract-ocr/ab0d9c8e-58d9-475a-8abe-78b5a778769d%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Regards, Soumik Ranjan Dasgupta -- You received this message because you are s

Re: [tesseract-ocr] Android app using Tesseract v4 for OCR

2019-03-31 Thread Soumik Ranjan Dasgupta
r than > that, it works fine on Android. > > /René > > On Sat, 30 Mar 2019 at 08:01 Soumik Ranjan Dasgupta < > srd1...@cse.jgec.ac.in> wrote: > >> Please update if you complete it and decide to upload on playstore. >> >> On Sat, Mar 30, 2019 at 11:58 AM prerna

Re: [tesseract-ocr] Android app using Tesseract v4 for OCR

2019-03-30 Thread Soumik Ranjan Dasgupta
Please update if you complete it and decide to upload on playstore. On Sat, Mar 30, 2019 at 11:58 AM prerna kumari wrote: > TextFairy is there and I'm also working on an android application using > tesseract. > > On Sat 30 Mar, 2019, 11:47 AM Soumik Ranjan Dasgupta, < >

[tesseract-ocr] Android app using Tesseract v4 for OCR

2019-03-29 Thread Soumik Ranjan Dasgupta
Hi, I'd like to know if there exists any android application for doing OCR that uses tesseract 4. Regards, Soumik Ranjan Dasgupta -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving

Re: [tesseract-ocr] Tesseract Latest version

2018-11-02 Thread Soumik Ranjan Dasgupta
On Nov 2, 2018 9:34 PM, "Nikhil Kumar" wrote: Hello. I am using tesseract 4.0,I want to know is this the latest version or any update has released after that? No, tesseract 4 is the latest version. If so can you please provide me the link and the update instructions. Thank you. -- You receiv

Re: [tesseract-ocr] Re: train more fonts on trained model fas in tesseract

2018-10-24 Thread Soumik Ranjan Dasgupta
> On Wednesday, 17 October 2018 20:18:26 UTC+5:30, Soumik Ranjan Dasgupta > wrote: >> >> You'll need to install the fonts in your system add the same in >> font_properties and language_specific.sh for fine-tuning or training from >> scratch. For further details pl

Re: [tesseract-ocr] Re: train more fonts on trained model fas in tesseract

2018-10-17 Thread Soumik Ranjan Dasgupta
t; Hello, > > Thanks for prompt reply, I want to train tesseract 4.0 alpha for font > E13B. How could i train? Please share the knowledge. > > On Tuesday, October 16, 2018 at 1:57:17 PM UTC+5:30, Soumik Ranjan > Dasgupta wrote: >> >> Please see >> https://github.

Re: [tesseract-ocr] Re: train more fonts on trained model fas in tesseract

2018-10-16 Thread Soumik Ranjan Dasgupta
view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/1ee9528e-d8fd-4438-9cd0-4925ae7763d5%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/1ee9528e-d8fd-4438-9cd0-4925ae7763d5%40googlegroups.com?utm_medium=email&utm_source=footer> >

Re: [tesseract-ocr] Making custom traineddata

2018-10-16 Thread Soumik Ranjan Dasgupta
/groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAN557awfgH5F07nyV5iL1o5pN4MfebOvUWsJBLdSbG6QsdCmew%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAN557awfgH5F07nyV5iL1

Re: [tesseract-ocr] Convert image to text shows arrow instead of empty string

2018-10-15 Thread Soumik Ranjan Dasgupta
helps. Can anyone else confirm this? On Mon, Oct 15, 2018 at 4:37 PM Magdalena Orzechowska wrote: > Actually when You open out.txt file in Notepad it's not empty. There is an > arrow there. The same arrow appears in PyCharm output. Previously it was > empty. > > niedz., 14

Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-15 Thread Soumik Ranjan Dasgupta
hon modules and tesseract is c++ project, that > are distributed with other tools (depending on linux distribution) - on > Ubuntu it should be apt. > > Zdenko > > > po 15. 10. 2018 o 10:09 Soumik Ranjan Dasgupta > napísal(a): > >> Is there any way tesseract c

Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-15 Thread Soumik Ranjan Dasgupta
oogle.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yLcCu7w9mXyWn-FYdKOAO3YLEzwUMqPOBkzDtkafM-rg%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yLcCu7w9mXyWn-FYdKOAO3YLEzwUMqPOBkzDt

Re: [tesseract-ocr] Training the LSTM language model explicitly in an unsupervised manner

2018-10-15 Thread Soumik Ranjan Dasgupta
69c88857%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/276e6dcf-f0b5-43e0-a794-d1bb69c88857%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Regards, Soumik Ranjan Dasgupta --

Re: [tesseract-ocr] Convert image to text shows arrow instead of empty string

2018-10-14 Thread Soumik Ranjan Dasgupta
ct C:/tmp\rip_message.png C:\automation\ext\ocr\out > I received empty file (out file), now for v4.0.0-rc2.20181008 the out file > looks like in attachment. > > > czw., 11 paź 2018 o 18:18 Soumik Ranjan Dasgupta > napisał(a): > >> I tried to reproduce the error and it did not oc

Re: [tesseract-ocr] Hand Written and System generated Text

2018-10-12 Thread Soumik Ranjan Dasgupta
ea5974bc-9ca2-4b00-bfe4-e958f26af2b1%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Regards, Soumik Ranjan Dasgupta -- You received this message because you are subscribed to the Google Groups &qu

Re: [tesseract-ocr] Undefined Reference errors when building Tesseract OCR

2018-10-12 Thread Soumik Ranjan Dasgupta
seract.so.4.0.0 > Error undefined reference to > `pixSetPixel' Tesseract.vgdbcmake libtesseract.so.4.0.0 > Error undefined reference to > `composeRGBPixel' Tesseract.vgdbcmake libtesseract.so.4.0.0 > Error undefined reference to > `pixWriteMem' Tesseract.vgdbcmake libtesseract.

Re: [tesseract-ocr] Failed to get Text extraction

2018-10-12 Thread Soumik Ranjan Dasgupta
Soumik Ranjan Dasgupta < srd1...@cse.jgec.ac.in> wrote: > Did you try using Tesseract version 4? > -- Regards, Soumik Ranjan Dasgupta -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop

Re: [tesseract-ocr] Failed to get Text extraction

2018-10-12 Thread Soumik Ranjan Dasgupta
Did you try using Tesseract version 4? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send e

Re: [tesseract-ocr] Multi language (English + Arabic ) Tesseract .Net

2018-10-12 Thread Soumik Ranjan Dasgupta
eract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/9d20c4b4-5000-4aa2-87e9-15192532683a%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/9d20c4b4-5000-4aa2-87e9-15192532683a%40googlegroups.com?utm_medium=email&utm_sou

Re: [tesseract-ocr] Generate box file for JPN_VERT?

2018-10-12 Thread Soumik Ranjan Dasgupta
ion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CA%2BVWkA4fF9kYUXxf7Rv2TS_uxxS90S72df-qFU_LXZAZ%2BxtTaA%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CA%2BVWkA4fF9kYUXxf7Rv2TS_uxxS90S72df-qFU_LXZAZ%2BxtTaA%40mail.gmail.com?utm_medium=email&ut

Re: [tesseract-ocr] Using Tesseract with openCV Mat

2018-10-12 Thread Soumik Ranjan Dasgupta
I guess it must be a > problem with Tesseract accessing the OpenCV mats wherever they exist, which > I'll have to try and work out. > > Is there a way to authorise access? I'm using cmake to compile on a > raspberry pi > > On Fri, Oct 12, 2018 at 3:08 AM Soumik Ran

Re: [tesseract-ocr] Word confidence versus symbol confidence

2018-10-12 Thread Soumik Ranjan Dasgupta
ps.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/1a83aa4d-5961-4265-9871-1bcac85e73e8%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/1a83aa4d-5961-4265-9871-1bcac85e73e8%40googlegroups.com?ut

Re: [tesseract-ocr] need help: tesseract 4 faild to detect <<<<<<< symbols

2018-10-12 Thread Soumik Ranjan Dasgupta
t traineddata (OCR-B), you can find one for older > version of tesseract somewhere in the internet, or train one for tesseract > 4 by yourself." > > and it works > > On Thu, Oct 11, 2018 at 9:40 PM Soumik Ranjan Dasgupta < > srd1...@cse.jgec.ac.in> wrote: > >>

Re: [tesseract-ocr] traineddata compatibility

2018-10-12 Thread Soumik Ranjan Dasgupta
ups.com > . > For more options, visit https://groups.google.com/d/optout. > -- Regards, Soumik Ranjan Dasgupta -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an

Re: [tesseract-ocr] Tesseract 4.0 confidence

2018-10-11 Thread Soumik Ranjan Dasgupta
5c-1389848c0daf%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/21f65ea3-1524-4e76-bc5c-1389848c0daf%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Regards, Soumik Ranjan Dasgupta -- You rece

Re: [tesseract-ocr] Convert image to text shows arrow instead of empty string

2018-10-11 Thread Soumik Ranjan Dasgupta
I tried to reproduce the error and it did not occur here. Could you be a bit more specific? What do you mean by "orange message"? Just to clarify, I used tesseract image.jpg stdout and I got an empty string in return. -- You received this message because you are subscribed to the Google Groups "

Re: [tesseract-ocr] Re: increase the quality of image so that it extracts proper text from it.

2018-10-11 Thread Soumik Ranjan Dasgupta
m/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/5d488e4a-6144-4d6b-808b-a3f6e030aeec%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/5d488e4a-6144-4d6b-808b-a3f6e030aeec%40googlegroups.com?utm_medium

Re: [tesseract-ocr] need help: tesseract 4 faild to detect <<<<<<< symbols

2018-10-11 Thread Soumik Ranjan Dasgupta
9ba0b01%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/ae3c7a5e-7d1f-46e2-a4ba-f281d9ba0b01%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Regards, Soumik Ranjan Dasgupta -- Y

Re: [tesseract-ocr] Re: error in running tesseract with API example

2018-10-11 Thread Soumik Ranjan Dasgupta
cr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/4aa9b453-8d64-452c-93ed-226c45f84590%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/4aa9b453-8d64-452c-93ed-226c45f84590%40googlegroups.com?utm_medium=email&utm_so

Re: [tesseract-ocr] Using Tesseract with openCV Mat

2018-10-11 Thread Soumik Ranjan Dasgupta
https://groups.google.com/d/msgid/tesseract-ocr/5248e483-3f5c-4834-9c31-341b6873ffe5%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/5248e483-3f5c-4834-9c31-341b6873ffe5%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit ht

Re: [tesseract-ocr] Getting time from image

2018-10-11 Thread Soumik Ranjan Dasgupta
ogle.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this grou

Re: [tesseract-ocr] What i need to do fine tuning for only numbers and specific font?

2018-09-16 Thread Soumik Ranjan Dasgupta
2c-10ef-427f-9b34-2627a90913de%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/4f4f3b2c-10ef-427f-9b34-2627a90913de%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Regards, Soumik

Re: [tesseract-ocr] Tesseract4 net spec

2018-09-13 Thread Soumik Ranjan Dasgupta
Please view https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs for details. Hope this helps. On Thu, Sep 13, 2018, 6:02 PM Raniem wrote: > Hello All.. > > > This might be a dummy question but I couldn't find a documentation > explaining the current tesseract4 net spec. > > IndexLayer > 0

Re: [tesseract-ocr] What i need to do fine tuning for only numbers and specific font?

2018-08-28 Thread Soumik Ranjan Dasgupta
r. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/2104120f-c23e-4959-8987-abbf30102ddf%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/2104120f-c23e-4959-8987-abbf30102ddf%40googlegroups.com?utm_medium=email&utm_sou

Re: [tesseract-ocr] What i need to do fine tuning for only numbers and specific font?

2018-08-24 Thread Soumik Ranjan Dasgupta
You could try changing the training text consisting of only numbers.Tesseract v4 has the option for training with a custom fontlist, please refer to the wiki. On Sat, Aug 25, 2018, 10:17 AM Yasin Nazlıcan wrote: > Hello everyone, I trained tesseract 3.0 with jessbox tools 3 years ago and > now I

Re: [tesseract-ocr] training handwritten digits

2018-08-09 Thread Soumik Ranjan Dasgupta
On Fri, Aug 10, 2018, 10:13 AM wrote: > Hi Shree and everyone: > > I just noticed that the training process of version 4.00 was updated > recently, now I plan to train handwritten digits using version 4.0, > but before training, I have two questions: > > 1. is it possible to fine tuning handwritt

Re: [tesseract-ocr] Questions about training korean language in tesseract 4.0

2018-07-18 Thread Soumik Ranjan Dasgupta
2) For checking the fonts used in generating the traineddata for your language, you can see training/language-specific.sh and langdata/font_properties under your respective language code. If I'm not wrong, the language code for korean is "kor". Check out langdata/kor directory. On Thu, Jul 19, 2

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-18 Thread Soumik Ranjan Dasgupta
Follow https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 to create the traineddata. Copy the eng.traineddata file to $TESSDATA_PREFIX directory, and you'll be good to go. On Wed, Jul 18, 2018 at 1:20 PM Soumik Ranjan Dasgupta < srd1...@cse.jgec.ac.in> wrote: >

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-18 Thread Soumik Ranjan Dasgupta
e > details of account no and pan no. > > > <https://lh3.googleusercontent.com/-KlAWj6TbcPI/W07exdGvG6I/JGQ/4_32r8dwWVgwCfhM2XT358jkABGAArBoACLcBGAs/s1600/dummy_crop.jpg> > > > On Wednesday, July 18, 2018 at 11:38:42 AM UTC+5:30, Soumik Ranjan > Dasgupta wro

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-17 Thread Soumik Ranjan Dasgupta
legroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/885fce6d-2b81-4bc2-9eee-4dea8df5c263%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/885

Re: [tesseract-ocr] how to update tesseract-ocr from 3.03 v to 4 in linux

2018-07-10 Thread Soumik Ranjan Dasgupta
groups.com > <https://groups.google.com/d/msgid/tesseract-ocr/9de68232-d209-4c7f-aecc-02648d27ae71%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Regards, Soumik Ranjan Dasgupta -- You receiv

[tesseract-ocr] Specify network architecture in Tesseract 4

2018-07-04 Thread Soumik Ranjan Dasgupta
I'm trying to create a custom network for training tesseract 4 with the following architecture : 784 - 15 -10 [ 3 fully connected layers ] I ran this command > *lstmtraining --debug_interval 0 --traineddata > ./digits/digits.traineddata --net_spec '[1,28,28,1 Fs15 O1s10]' > --model_output ~/