[tesseract-ocr] training doubts

2020-10-20 Thread Kumar Rajwani
hey i need small help as i have to train tesseract on my documents. I have already read some training issues and i have steps that i can perform. 1. !tesseract "document.png" "document" -l eng --psm 11 wordstrbox it will give me line lavel box correct ocr. copy image file and

[tesseract-ocr] not training on image after loading data

2021-02-05 Thread Kumar Rajwani
HI, i am trying to finetune eng.traindata as per my images i have tried to train but all time i am stuck somewhere can you tell me how can i procced further. current steps step 1 make box files %%bash for file in *.jpg; do echo $file base=`basename $file .jpg` tesseract $file $base lstmbox

Re: [tesseract-ocr] not training on image after loading data

2021-02-05 Thread Kumar Rajwani
ebruary 5, 2021 at 4:28:14 PM UTC+5:30 shree wrote: > Add the following to your lstmtraining command and see. > --debug_interval -1 > > > > On Fri, Feb 5, 2021 at 4:05 PM Kumar Rajwani > wrote: > >> HI, >> i am trying to finetune eng.traindata as per my image

Re: [tesseract-ocr] not training on image after loading data

2021-02-05 Thread Kumar Rajwani
1 at 4:37 PM Kumar Rajwani > wrote: > >> hi, >> i have tried that it's shows following output >> Starting sh -c "trap 'kill %1' 0 1 2 ; java -Xms1024m -Xmx2048m -jar >> ./ScrollView.jar & wait" >> ScrollView: Waiting for server..

Re: [tesseract-ocr] not training on image after loading data

2021-02-05 Thread Kumar Rajwani
i have tried to do same thing in tesseract 4 which stuck at following line. Compute CTC targets failed! On Friday, February 5, 2021 at 5:04:42 PM UTC+5:30 Kumar Rajwani wrote: > !tesseract -v > tesseract 5.0.0-alpha-20201231-171-g04173 > leptonica-1.78.0 > libgif 5.1.4 : libjpeg

Re: [tesseract-ocr] not training on image after loading data

2021-02-05 Thread Kumar Rajwani
;t think training is going > to help you in this. eng.traineddata should be able to recognize it quite > well. You should select the different areas of interest and just OCR those > sections. > > On Fri, Feb 5, 2021 at 5:33 PM Kumar Rajwani > wrote: > >> i have tried to

Re: [tesseract-ocr] not training on image after loading data

2021-02-05 Thread Kumar Rajwani
ary 5, 2021 at 5:50:30 PM UTC+5:30 Kumar Rajwani wrote: > main thing is i want to learn about training tesseract on image level so > can you please tell me how can i procced further. i want to know where is > the main problem. > > > On Friday, February 5, 2021 at 5:46:22 PM UTC+5

Re: [tesseract-ocr] not training on image after loading data

2021-02-05 Thread Kumar Rajwani
://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/ > > > https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format > > On Fri, Feb 5, 2021 at 6:14 PM Kumar Rajwani > wrote: > >> i h

Re: [tesseract-ocr] not training on image after loading data

2021-02-06 Thread Kumar Rajwani
hey can you please tell me how can i improve the text detection for the same kind of images? On Friday, February 5, 2021 at 8:38:31 PM UTC+5:30 Kumar Rajwani wrote: > Thanks for this. i know about the usage of the tesseract. i have multiple > images where i can't improve image q

Re: [tesseract-ocr] not training on image after loading data

2021-02-08 Thread Kumar Rajwani
hey, i am still waiting for your reply. can you please solve my doubts. On Sunday, February 7, 2021 at 8:13:56 AM UTC+5:30 Kumar Rajwani wrote: > hey can you please tell me how can i improve the text detection for the > same kind of images? > > On Friday, February 5, 2021 at 8:38

[tesseract-ocr] After training words aren't predicted correct

2021-02-10 Thread Kumar Rajwani
Hey , I am able to train tesseract 5 on my images. where i have seen some improvement also. But sometime words are predicted which not a correct like your = vour and commercial = commer cal . Can you guide me how can i solve this thing? -- You received this message because you are subscribed t

Re: [tesseract-ocr] not training on image after loading data

2021-02-11 Thread Kumar Rajwani
---- > web:http://www.hobbelt.com/ > http://www.hebbut.net/ > mail: g...@hobbelt.com > mobile: +31-6-11 120 978 > ------ > > > On Mon, Feb 8, 2021 at 1:47 PM Kumar Rajwani > wrote:

Re: [tesseract-ocr] not training on image after loading data

2021-03-04 Thread Kumar Rajwani
lls spanning columns or > rows. So more research needed before I'ld code that preprocess. > > Another issue with the line detection + removal/zoning techniques would be > making sure the lines are either near perfect horizontal and vertical all > (*orienting*/*deskewing* the ima

[tesseract-ocr] tesseract 5 alpha is not working

2021-03-22 Thread Kumar Rajwani
intallation !apt-get install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr !sudo apt-get install libenchant1c2a !sudo add-apt-repository -y ppa:alex-p/tesseract-ocr-devel !sudo apt-get update # !sudo apt-get install tesseract-ocr !sudo apt install -y tes

[tesseract-ocr] downgrade to last tessract alpha version tesseract 5.0.0-alpha-20201231-246-gfe61

2021-03-23 Thread Kumar Rajwani
The latest push is working fine but when image is blury or have some noise it can't able to pass the image. it shows Detected 12 diacritics . The previous version was working fine with my images. intallation !apt-get install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils psto

Re: [tesseract-ocr] downgrade to last tessract alpha version tesseract 5.0.0-alpha-20201231-246-gfe61

2021-03-23 Thread Kumar Rajwani
Thanks shree, i will report it in repo. On Tuesday, March 23, 2021 at 1:54:38 PM UTC+5:30 shree wrote: > Please report as issue in tesseract repo. > > On Tue, Mar 23, 2021, 13:46 Kumar Rajwani wrote: > >> The latest push is working fine but when image is blury or have some

[tesseract-ocr] detect decimal point in amount with psm 11

2021-04-21 Thread Kumar Rajwani
Hey, I am using tesseract to identify amounts in my forms. You can look below image for sample. i am getting perfect amount with decimal in psm 6. but when i use psm 11 i am getting follwing output. I have to use psm 11 as it identify more text with compare to psm 6 in my images. 250,941 00 00 -7

Re: [tesseract-ocr] detect decimal point in amount with psm 11

2021-04-21 Thread Kumar Rajwani
2,860.00 > $ 0.00 > $ 163,447.00 > > legacy engine could be better for numbers > > Zdenko > > > st 21. 4. 2021 o 14:10 Kumar Rajwani napísal(a): > >> Hey, >> I am using tesseract to identify amounts in my forms. You can look below >> image for sample.

Re: [tesseract-ocr] detect decimal point in amount with psm 11

2021-04-21 Thread Kumar Rajwani
e result for the image you provided. >2. I suggest you to use other oem >3. I know that invoice digitalizator use different parameters for >parsing numbers. > > > Zdenko > > > st 21. 4. 2021 o 17:45 Kumar Rajwani napísal(a): > >> Hi Zdenop, As i sai

Re: [tesseract-ocr] detect decimal point in amount with psm 11

2021-04-23 Thread Kumar Rajwani
Can you tell is there any way we can make psm 11 parameter to recognize numbers well. It will be great than. On Thursday, April 22, 2021 at 12:11:59 PM UTC+5:30 Kumar Rajwani wrote: > Hey zdenop that was the portion of full image which was not detected > properly by tesseract. In full

Re: [tesseract-ocr] detect decimal point in amount with psm 11

2021-04-23 Thread Kumar Rajwani
Hey, i have teseted textord_dotmatrix_gap=3 this parameter which i think combine the number with decimal can you plese tell me i am right or not? Thanks On Friday, April 23, 2021 at 1:46:44 PM UTC+5:30 Kumar Rajwani wrote: > Hi , can you please look into this image so we can get more clear i

Re: [tesseract-ocr] detect decimal point in amount with psm 11

2021-04-23 Thread Kumar Rajwani
or me? Thanks On Friday, April 23, 2021 at 1:46:44 PM UTC+5:30 Kumar Rajwani wrote: > Hi , can you please look into this image so we can get more clear idea why > i want to go with psm 11 . > If you try this image with psm 6 then > It will miss the first line and date will be wrong als

Re: [tesseract-ocr] detect decimal point in amount with psm 11

2021-04-26 Thread Kumar Rajwani
Hey can you please suggest something how can i achive better results. On Friday, April 23, 2021 at 9:10:51 PM UTC+5:30 Kumar Rajwani wrote: > hey zdenop can you please see > tesseract /content/img.png out2 --psm 11 -c textord_min_linesize=3 this > command it's working for me. Ple

[tesseract-ocr] finetune model huge time difference

2021-04-28 Thread Kumar Rajwani
[image: Capture.PNG] Hey , Can you please explain me about this behaviour. The only difference is the eng2 model that i have finetune as per my document. eng is default model of tesseract. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To u

[tesseract-ocr] Speed doubts

2021-06-07 Thread Kumar Rajwani
I have posted issues in tesseract also you can look here. https://github.com/tesseract-ocr/tesseract/issues/3450 Environment - *Tesseract Version*: tesseract 5.0.0-alpha-20210401-94-ga968 - *Platform*: ubuntu 18.04 colab I have to reduce my trained model time as per the default tessera

[tesseract-ocr] Tesseract training dataset

2021-07-06 Thread Kumar Rajwani
May i know on which dataset tesseract is trained on. If you know any other dataset of ocr on black and white images then please provide a link. Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop rece

[tesseract-ocr] Re: Tesseract training dataset

2021-08-19 Thread Kumar Rajwani
No i haven't found any dataset. I want to know initial dataset where tesseract is trained on. If you want to fine-tune then tesseract training is creating synthetic data from txt file. If you found any dataset please shere here. On Thursday, August 19, 2021 at 3:08:55 PM UTC+5:30 nebiy...@gmail.co