Hi all,
I tried to run an example of LSTM training and used the following command:
*for f in *.tif; dotesseract $f ${f%.*} -l deu lstmbox done*
The result of box files seems detect by single-level box instead of
character-level box. All the character shares the same coordinates, width
a
Is there a train data file that contains Arabic characters and numbers?
I can get only characters or numbers not both
Also I use this with JAVA not the OCR Tool
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and
Did you find anything ?
On Wednesday, 16 March 2022 at 09:47:05 UTC+2 Tahir Rehman wrote:
> Hi all,
>
> I'm working on project that needs OCR for for Arabic Cards, these cards
> can be Identity cards, business card or visiting cards .
>
> if anyone have an idea for any open source project or op
Are you following official tutorials?
Did you read the documentation?
Have you tried to check the official training repository and provided
examples?
Zdenko
st 1. 11. 2023 o 10:15 TRAN TRONG KHANH[학생](대학원 컴퓨터공학과) <
khanhtran...@khu.ac.kr> napísal(a):
> Hi all,
>
> I tried to run an example of
fyi, I asked the same question in
https://groups.google.com/g/tesseract-ocr/c/9myrnSD0HKM
On Wednesday, November 1, 2023 at 7:21:37 AM UTC-4 zdenop wrote:
> Are you following official tutorials?
> Did you read the documentation?
> Have you tried to check the official training repository and pro
On 1 Nov 2023 at 11:51:27 AM, TRAN TRONG KHANH[학생](대학원 컴퓨터공학과) <
khanhtran...@khu.ac.kr> wrote:
>
Are you trying to generate box files from the images (tif files)?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this gro
I don't know what you are trying to do. I am not familiar with this method
of box generation. But, I think the command you are running is supposed to
generate them with the same coordinates. Look at the example here:
https://tesseract-ocr.github.io/tessdoc/tess4/Make-Box-Files.html
On Wednes
I am not sure if you are supposed to use those box files for training
purposes. All the guides and manuals I have read use either text2image
script, or the manual method(which is presumably outdated method).
On Wednesday, October 18, 2023 at 6:27:58 PM UTC+3 Keith Smith wrote:
> I tried using
Doesn't the official Arabic model include the numberal?
The Arabic numberals are supposed to be part of almost all the models.
The Amharic model, I am working on, for example, does recognize Arabic
numerals (of course, along with the regular letter characters).
--
You received this message bec
You need to try to process the images first. I recommend you to try
ScanTailor. You can then import the processed images to Tesseract. The
accuracy will improve.
Are you using the official English model to ocr them?
On Wednesday, November 1, 2023 at 2:18:54 PM UTC+3 zdenop wrote:
> Read the do
"Please note that box files generated using makebox config file are OK for
training legacy models but not for LSTM training.". Makebox is the tool
included inside tesseract to generate box files. It looks like that was
used for the legacy model. For the current model, text2image is the way to
d
Thank you for your responses. Regarding my question and referring to the
official documentation at Tesseract Documentation, the generated .box files
have the *same coordinates* for every character because they use line-level
boxes instead of character-level boxes.
Also, I have a couple of conce
Thank you for your responses. Regarding my question and referring to the
official documentation at
https://tesseract-ocr.github.io/tessdoc/tess4/Make-Box-Files.html , the
generated .box files for LSTM-based training have the *same coordinates* for
every character because they use line-level bo
*1. using sythetic data: *
What can you do if you do not have a data that is confirmed to be accurate?
The only way around that I know is to use sythetic data. That is: you
generate the images from the texts using text2image script. You then train
from that one. The accuracy of the result mod
To clarify, Shree's script is useful in case your images are not single
line. If they are all single line, that script won't do much for you.
On Wednesday, November 1, 2023 at 4:20:09 PM UTC+3 Des Bw wrote:
>
> *1. using sythetic data: *
> What can you do if you do not have a data that is conf
I tried ara.traineddata , Arabic.traineddata and ara-Amiri.traineddata all
don't have the Arabic (Indian) numbers but have the normal (English) numbers
On Wednesday, 1 November 2023 at 14:09:45 UTC+2 desal...@gmail.com wrote:
> Doesn't the official Arabic model include the numberal?
> The Arabic
On Wednesday, November 1, 2023 at 10:02:22 AM UTC-4 mosta@gmail.com
wrote:
I tried ara.traineddata , Arabic.traineddata and ara-Amiri.traineddata all
don't have the Arabic (Indian) numbers but have the normal (English) numbers
You might want to clarify whether you are referring
to: https:
Doesn't anybody have any ideas? :-(
On Tuesday, October 24, 2023 at 5:40:20 PM UTC+1 Slartybartfast wrote:
> Hi
> I am a new tesseract user, and I'm really struggling to get it to produce
> any kind of sensible results, especially with numerical text. I have some
> text that looks like this:
>
18 matches
Mail list logo