hey i need small help as i have to train tesseract on my documents.
I have already read some training issues and i have steps that i can
perform.
1.
!tesseract "document.png" "document" -l eng --psm 11 wordstrbox it will
give me line lavel box
correct ocr. copy image file and
HI,
i am trying to finetune eng.traindata as per my images i have tried to
train but all time i am stuck somewhere can you tell me how can i procced
further.
current steps
step 1 make box files
%%bash
for file in *.jpg; do
echo $file
base=`basename $file .jpg`
tesseract $file $base lstmbox
ebruary 5, 2021 at 4:28:14 PM UTC+5:30 shree wrote:
> Add the following to your lstmtraining command and see.
> --debug_interval -1
>
>
>
> On Fri, Feb 5, 2021 at 4:05 PM Kumar Rajwani
> wrote:
>
>> HI,
>> i am trying to finetune eng.traindata as per my image
1 at 4:37 PM Kumar Rajwani
> wrote:
>
>> hi,
>> i have tried that it's shows following output
>> Starting sh -c "trap 'kill %1' 0 1 2 ; java -Xms1024m -Xmx2048m -jar
>> ./ScrollView.jar & wait"
>> ScrollView: Waiting for server..
i have tried to do same thing in tesseract 4 which stuck at following line.
Compute CTC targets failed!
On Friday, February 5, 2021 at 5:04:42 PM UTC+5:30 Kumar Rajwani wrote:
> !tesseract -v
> tesseract 5.0.0-alpha-20201231-171-g04173
> leptonica-1.78.0
> libgif 5.1.4 : libjpeg
;t think training is going
> to help you in this. eng.traineddata should be able to recognize it quite
> well. You should select the different areas of interest and just OCR those
> sections.
>
> On Fri, Feb 5, 2021 at 5:33 PM Kumar Rajwani
> wrote:
>
>> i have tried to
ary 5, 2021 at 5:50:30 PM UTC+5:30 Kumar Rajwani wrote:
> main thing is i want to learn about training tesseract on image level so
> can you please tell me how can i procced further. i want to know where is
> the main problem.
>
>
> On Friday, February 5, 2021 at 5:46:22 PM UTC+5
://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/
>
>
> https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format
>
> On Fri, Feb 5, 2021 at 6:14 PM Kumar Rajwani
> wrote:
>
>> i h
hey can you please tell me how can i improve the text detection for the
same kind of images?
On Friday, February 5, 2021 at 8:38:31 PM UTC+5:30 Kumar Rajwani wrote:
> Thanks for this. i know about the usage of the tesseract. i have multiple
> images where i can't improve image q
hey, i am still waiting for your reply. can you please solve my doubts.
On Sunday, February 7, 2021 at 8:13:56 AM UTC+5:30 Kumar Rajwani wrote:
> hey can you please tell me how can i improve the text detection for the
> same kind of images?
>
> On Friday, February 5, 2021 at 8:38
Hey , I am able to train tesseract 5 on my images. where i have seen some
improvement also. But sometime words are predicted which not a correct like
your = vour and commercial = commer cal .
Can you guide me how can i solve this thing?
--
You received this message because you are subscribed t
----
> web:http://www.hobbelt.com/
> http://www.hebbut.net/
> mail: g...@hobbelt.com
> mobile: +31-6-11 120 978
> ------
>
>
> On Mon, Feb 8, 2021 at 1:47 PM Kumar Rajwani
> wrote:
lls spanning columns or
> rows. So more research needed before I'ld code that preprocess.
>
> Another issue with the line detection + removal/zoning techniques would be
> making sure the lines are either near perfect horizontal and vertical all
> (*orienting*/*deskewing* the ima
intallation
!apt-get install python-dev libxml2-dev libxslt1-dev antiword unrtf
poppler-utils pstotext tesseract-ocr
!sudo apt-get install libenchant1c2a
!sudo add-apt-repository -y ppa:alex-p/tesseract-ocr-devel
!sudo apt-get update
# !sudo apt-get install tesseract-ocr
!sudo apt install -y tes
The latest push is working fine but when image is blury or have some noise
it can't able to pass the image. it shows Detected 12 diacritics . The
previous version was working fine with my images.
intallation
!apt-get install python-dev libxml2-dev libxslt1-dev antiword unrtf
poppler-utils psto
Thanks shree, i will report it in repo.
On Tuesday, March 23, 2021 at 1:54:38 PM UTC+5:30 shree wrote:
> Please report as issue in tesseract repo.
>
> On Tue, Mar 23, 2021, 13:46 Kumar Rajwani wrote:
>
>> The latest push is working fine but when image is blury or have some
Hey,
I am using tesseract to identify amounts in my forms. You can look below
image for sample. i am getting perfect amount with decimal in psm 6.
but when i use psm 11 i am getting follwing output. I have to use psm 11 as
it identify more text with compare to psm 6 in my images.
250,941
00
00
-7
2,860.00
> $ 0.00
> $ 163,447.00
>
> legacy engine could be better for numbers
>
> Zdenko
>
>
> st 21. 4. 2021 o 14:10 Kumar Rajwani napísal(a):
>
>> Hey,
>> I am using tesseract to identify amounts in my forms. You can look below
>> image for sample.
e result for the image you provided.
>2. I suggest you to use other oem
>3. I know that invoice digitalizator use different parameters for
>parsing numbers.
>
>
> Zdenko
>
>
> st 21. 4. 2021 o 17:45 Kumar Rajwani napísal(a):
>
>> Hi Zdenop, As i sai
Can you tell is there any way we can make psm 11 parameter to recognize
numbers well. It will be great than.
On Thursday, April 22, 2021 at 12:11:59 PM UTC+5:30 Kumar Rajwani wrote:
> Hey zdenop that was the portion of full image which was not detected
> properly by tesseract. In full
Hey, i have teseted textord_dotmatrix_gap=3 this parameter which i think
combine the number with decimal can you plese tell me i am right or not?
Thanks
On Friday, April 23, 2021 at 1:46:44 PM UTC+5:30 Kumar Rajwani wrote:
> Hi , can you please look into this image so we can get more clear i
or me?
Thanks
On Friday, April 23, 2021 at 1:46:44 PM UTC+5:30 Kumar Rajwani wrote:
> Hi , can you please look into this image so we can get more clear idea why
> i want to go with psm 11 .
> If you try this image with psm 6 then
> It will miss the first line and date will be wrong als
Hey can you please suggest something how can i achive better results.
On Friday, April 23, 2021 at 9:10:51 PM UTC+5:30 Kumar Rajwani wrote:
> hey zdenop can you please see
> tesseract /content/img.png out2 --psm 11 -c textord_min_linesize=3 this
> command it's working for me. Ple
[image: Capture.PNG]
Hey , Can you please explain me about this behaviour. The only difference
is the eng2 model that i have finetune as per my document. eng is default
model of tesseract.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To u
I have posted issues in tesseract also you can look here.
https://github.com/tesseract-ocr/tesseract/issues/3450
Environment
- *Tesseract Version*: tesseract 5.0.0-alpha-20210401-94-ga968
- *Platform*: ubuntu 18.04 colab
I have to reduce my trained model time as per the default tessera
May i know on which dataset tesseract is trained on.
If you know any other dataset of ocr on black and white images then please
provide a link.
Thanks
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop rece
No i haven't found any dataset.
I want to know initial dataset where tesseract is trained on.
If you want to fine-tune then tesseract training is creating synthetic data
from txt file.
If you found any dataset please shere here.
On Thursday, August 19, 2021 at 3:08:55 PM UTC+5:30 nebiy...@gmail.co
27 matches
Mail list logo