[tesseract-ocr] Train tesseract with a font for European car license plates

2024-05-31 Thread 'Ronny Zimmermann' via tesseract-ocr
I'm trying to improve tesseract's recognition for European license plates. The corresponding font only has 41 characters. I did the following steps that I'm not sure if I'm using correctly (bash script): # tesseract-ocr training script # Generation of image and box files text2image --fonts_dir /u

[tesseract-ocr] Train Tesseract 5 german for new font

2024-05-12 Thread testcoal
Hi, I wanted to reach out regarding my recent attempt to train Tesseract 5 for a new font, specifically in German. I followed a tutorial I found on YouTube: https://www.youtube.com/watch?v=KE4xEzFGSU8) and initially had success when training it for English. However, upon transitioning to Germa

[tesseract-ocr] Train Tesseract with my own Data

2024-04-22 Thread testcoal
Hi, i am trying to train a tesseract model with my own data. This is my code : import os # Pfade konfigurieren TRAIN_DATA_DIR = "./data1" TRAIN_LISTFILE = "./trainingsliste.txt" OUTPUT_DIR = "./output" TRAINEDDATA = "./tesseract-4.1/tessdata/deu.traineddata" # Prüfe notwendige Pfade if not os.pat

Re: [tesseract-ocr] Train Tesseract (german)

2024-04-18 Thread Misti Hamon
Scanned books? No help on training or choosing datasets, but, if these images are photoscanned book pages, did you run the images through book specific processing software (scantailor, spreads, or bookscan wizard are the 3 I know of, plus internet archive's scan tool scripts) to split your source

[tesseract-ocr] Train Tesseract (german)

2024-04-18 Thread testcoal
Hi, I've been utilizing Tesseract 4 to extract text from PNG and TIFF images, and all the content is in German. While the image quality is pretty decent, the extraction results have been less than stellar for some of them. I understand that to improve OCR accuracy, training Tesseract with addit

[tesseract-ocr] train tesseract

2024-03-04 Thread thangaraj r
how to train tesseract ,prepare dataset -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on t

[tesseract-ocr] Train tesseract on predefined train and validation splits

2021-01-22 Thread lebo
Hi everyone, I am training tesseract from scratch on a dataset with predefined train and validation splits. Is it possible to create custom list.eval and list.train for this use case? Thanks for your help. Lebo -- You received this message because you are subscribed to the Google Groups "

Re: [tesseract-ocr] Train Tesseract to ignore music?

2019-06-28 Thread Timothy Snyder
A picture would be helpful. From my experience, however, writing an independent program to segment text from "noisy" images with a lot of non-text print will give you the best results. Depending on how much the layout of those books varies between pages, this could be a simple or complicated task.

Re: [tesseract-ocr] Train Tesseract to ignore music?

2019-06-28 Thread Lorenzo Bolzani
Hi Sara, can you please post a sample picture? You could probably detect the pentagram (hough lines with very tight paramters, custom horizontal lines detection) and just replace it with a white rectangle. Lorenzo Il giorno ven 28 giu 2019 alle ore 07:15 Sara Palmer ha scritto: > I'd like to p

[tesseract-ocr] Train Tesseract to ignore music?

2019-06-27 Thread Sara Palmer
I'd like to produce high-quality OCR of books that contain text interspersed with music. Is it possible to train Tesseract to ignore musical notation instead of turning it into junk OCR? How would one go about doing this? -- You received this message because you are subscribed to the Google Gr

[tesseract-ocr] Train Tesseract for Number plate

2019-04-07 Thread Gaurav Sharma
Hi Authors, I want to train tesseract ocr for my number plate and get output as num.traineddata. so that i can use it to recognize number plate. i have extracted number plate images from it i want to train my tesseract ocr. Please look into that any help would be appreciated. Tesseract versio

Re: [tesseract-ocr] train tesseract 4.0

2019-01-24 Thread Aodren BARY
try something like this instead shapeclustering -F font_properties --U unicharset *.tr -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr..

Re: [tesseract-ocr] train tesseract 4.0

2019-01-24 Thread Pradeep Kumar Nalluri
shapeclustering -F font_properties –U unicharset {.tr files}//All the tr files without flower braces Thanks With Regards Pradeep On Thursday, January 24, 2019 at 1:38:42 PM UTC+5:30, Aodren BARY wrote: > > What's the command line you make ? > -- You received this message because you are subsc

Re: [tesseract-ocr] train tesseract 4.0

2019-01-24 Thread Aodren BARY
What's the command line you make ? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email

Re: [tesseract-ocr] train tesseract 4.0

2019-01-24 Thread Pradeep Kumar Nalluri
Hi my tr file is also not empty. Here is my one of the .tr file strangelabelmachinefont A 6 2 11 9 0 4 mf 4 0.22041225 0.042717576 0.52497548 0.76966679 0 0 0.039773703 0.30320382 0.29657078 0 0 0 -0.20557119 0.020642474 0.59753317 0.19734018 0 0 -0.024932638 -0.23984377 0.55714816 0.51

Re: [tesseract-ocr] train tesseract 4.0

2019-01-23 Thread Aodren BARY
No, i assume wrong :) It's your tr file which is empty. Do you have any error output when you do tesseract [...] nobatch box.train ? thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails

Re: [tesseract-ocr] train tesseract 4.0

2019-01-23 Thread Pradeep Kumar Nalluri
Hi My unicharset is as follows: 21 NULL 0 Common 0 Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a |Broken|0|1 f 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken B 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 B # B [42 ]A H 5 0,255,0,255,0,0,0,0,0,0 Latin 4

Re: [tesseract-ocr] train tesseract 4.0

2019-01-23 Thread Aodren BARY
Hi, The assert error means you have a null pointer. They don't say a lot, but I assume it's because your unicharset file is empty. Can you show you create your unicharset file ? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe

Re: [tesseract-ocr] train tesseract 4.0

2019-01-23 Thread Pradeep Kumar Nalluri
Hi I am new to tesseract as well in my OCR all there is a trouble in recognising the letter "I" so I decided to train by taking several examples of the "I" character I created the box files and modified them accordingly and then created unicharset then when I tried to create .tr file I am ge

Re: [tesseract-ocr] train tesseract 4.0

2019-01-21 Thread Aodren BARY
Many thanks, I test your process. I tried with one Image file, whitout any modification, the result are very Bad, but for once i can see the all training process whithout any Segmentation fault. I am gonna to try with a larger set of Image to see what append, and i am gonna to continue to test f

Re: [tesseract-ocr] train tesseract 4.0

2019-01-20 Thread 易鑫
tesseractV4 Aodren BARY 于2019年1月18日周五 下午10:17写道: > Thanks for the answer, > i didn't try your solution, I will do it, do you tesseract v3 or V4?? > > Le vendredi 18 janvier 2019 10:35:26 UTC+1, 易鑫 a écrit : >> >> Hello,I am also a new user of Tesseract. I have trained tesseract by >> myself and

Re: [tesseract-ocr] train tesseract 4.0

2019-01-18 Thread Aodren BARY
Thanks for the answer, i didn't try your solution, I will do it, do you tesseract v3 or V4?? Le vendredi 18 janvier 2019 10:35:26 UTC+1, 易鑫 a écrit : > > Hello,I am also a new user of Tesseract. I have trained tesseract by > myself and it can improve my result.I do not use the "lstm" file, may be

[tesseract-ocr] train tesseract 4.0

2019-01-16 Thread Aodren BARY
Hi, I am a new user of Tesseract, and i am a bit lost. I want to use Tesseract for extract text from receipt. In a way my receipt is clean (no scan, direct picture) The result are great but i want to improve the result, and i want a good result with scan receipt too ... So , i read to wiki, and

Re: [tesseract-ocr] Train Tesseract 4.0 for Urdu Nastaleeq fonts

2018-10-24 Thread Zdenko Podobny
If you want to train for 4.0 version, you should follow training instruction for 4.00 version. If you decide to go your own way, it is fine, but please do not claim that official instruction does not work for you, or output is very inaccurate. Zdenko st 24. 10. 2018 o 9:20 Shubham Gupta napísal

Re: [tesseract-ocr] Train Tesseract 4.0 for Urdu Nastaleeq fonts

2018-10-24 Thread Shubham Gupta
I am using automated way of generating files like .unicharset file , .normproto file, inttemp file etc.which ultimately gives Traineddata file. I am using JTesseract Editor utility. I gave it my text files and It generated Traineddata and rest other files for me. But when I give nastaleeq input fil

Re: [tesseract-ocr] Train Tesseract 4.0 for Urdu Nastaleeq fonts

2018-10-24 Thread Zdenko Podobny
What did not work for you? Zdenko st 24. 10. 2018 o 9:04 napísal(a): > Yes I did. But its not working out for me. > > On Wednesday, October 24, 2018 at 12:31:09 PM UTC+5:30, zdenop wrote: >> >> Did you read >> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00? >> >> Zdenk

Re: [tesseract-ocr] Train Tesseract 4.0 for Urdu Nastaleeq fonts

2018-10-24 Thread gupta . shubham
Yes I did. But its not working out for me. On Wednesday, October 24, 2018 at 12:31:09 PM UTC+5:30, zdenop wrote: > > Did you read > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00? > > Zdenko > > > st 24. 10. 2018 o 8:59 > napísal(a): > >> I am trying to train Tesseract fo

Re: [tesseract-ocr] Train Tesseract 4.0 for Urdu Nastaleeq fonts

2018-10-24 Thread Zdenko Podobny
Did you read https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00? Zdenko st 24. 10. 2018 o 8:59 napísal(a): > I am trying to train Tesseract for Urdu Nastaleeq fonts. I used 10 Text > files of total 1 MB and gave them to the jTesseract editor to create box > files and then c

[tesseract-ocr] Train Tesseract 4.0 for Urdu Nastaleeq fonts

2018-10-23 Thread gupta . shubham
I am trying to train Tesseract for Urdu Nastaleeq fonts. I used 10 Text files of total 1 MB and gave them to the jTesseract editor to create box files and then create traineddata file. But It gives an error: *Error: unichar بجا in normproto file is not in unichar set*. The output that comes

Re: [tesseract-ocr] train tesseract OCR 4.0

2018-10-22 Thread Shree Devi Kumar
Please see https://github.com/tesseract-ocr/tesseract/wiki and https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact On Mon, 22 Oct 2018, 06:59 kislay bajpai, wrote: > Hello, > > Sorry to disturb you, actually i am very new with tesseract and getting no >

Re: [tesseract-ocr] train tesseract OCR 4.0

2018-10-22 Thread kislay bajpai
Hello, Sorry to disturb you, actually i am very new with tesseract and getting no idea, how to train it. Please help me out. I am in big trouble. version - tesseract4.0 alpha OS - ubuntu16.04 and RHEL 7.3 (any one i can use) On Tue, Oct 16, 2018 at 7:10 PM Shree Devi Kumar wrote: > Please do n

Re: [tesseract-ocr] train tesseract OCR 4.0

2018-10-16 Thread Shree Devi Kumar
Please do not use tesseract 4.0 alpha. There have been many changes since then. Use the latest code from github, which is 4.0.0-rc3 or install from Alex's PPA or from ub mannheim (for Windows). Please read the wiki pages about training for new font for tesseract 4 - fine tuning for Impact. On Tu

Re: [tesseract-ocr] train tesseract OCR 4.0

2018-10-16 Thread kislay bajpai
Hello Shree, I am confused how to train tesseract 4.0 alpha for new font (E 13B). Please help me for it. On Thursday, March 23, 2017 at 5:24:59 PM UTC+5:30, shree wrote: > > To read characters from an image, it is not necessary to train it. Just > use an appropriate traineddata. > > Training i

Re: [tesseract-ocr] Train Tesseract 4.0 on Windows 8

2018-04-20 Thread crytoy
divana@divana-pc MINGW64 ~/Desktop/train bash ./tesstrain.sh --fonts_dir ./fonts --lang eng --linedata_only --training_ text ./txt/english.txt --noextract_font_properties --langdata_dir ./langdata -- tessdata_dir ./tessdata --fontlist "Arial," ./tesstrain_utils.sh: line 106: test: =: unary opera

Re: [tesseract-ocr] Train Tesseract 4.0 on Windows 8

2018-04-20 Thread crytoy
bash ./tesstrain.sh --fonts_dir ./fonts --lang eng --linedata_only --training_ text ./txt/english.txt --noextract_font_properties --langdata_dir ./langdata -- tessdata_dir ./tessdata --fontlist "Arial," --output_dir ./out ./tesstrain_utils.sh: line 106: test: =: unary operator expected ./tesstra

Re: [tesseract-ocr] Train Tesseract 4.0 on Windows 8

2018-04-19 Thread ShreeDevi Kumar
tesstrain.sh is a bashshell script. You don't need python for it. try the following: (give the correct path) bash ./tesstrain.sh ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Apr 19, 2018 at 8:01 PM, wrote:

[tesseract-ocr] Train Tesseract 4.0 on Windows 8

2018-04-19 Thread crytoy
I have installed the lastest tesseract 4.0 binary from UB Mannheim, along with python, Git & Java on my Windows 8 64bit. I am trying to run the "tesstrain.sh" script, but an erro message appears, any help?

Re: [tesseract-ocr] train tesseract to improve the half-width Japanese(Katakana) recognition.

2017-11-09 Thread Li Xianglei
> > Recently I modified the tesstrain_utils.sh and --max_pages=3 option > for text2image command, Got an error, I mean I modified the tesstrain_utils.sh and *remove* the --max_pages=3 option. 在 2017年11月10日星期五 UTC+8上午10:29:21,Li Xianglei写道: > > Recently I modified the tesstrain_utils.sh and

Re: [tesseract-ocr] train tesseract to improve the half-width Japanese(Katakana) recognition.

2017-11-09 Thread Li Xianglei
Recently I modified the tesstrain_utils.sh and --max_pages=3 option for text2image command, it seems the the normal Japanese now can work happlily, but the half-width characters still in a poor accuracy. Now I wonder how many characters should I add to the jpn.training_text, the wiki [ Fine Tuni

Re: [tesseract-ocr] train tesseract to improve the half-width Japanese(Katakana) recognition.

2017-11-08 Thread Li Xianglei
Yes, I added half-width characters to the given jpn.training_text and takes it as new jpn.training_text. 在 2017年11月9日星期四 UTC+8上午1:21:45,shree写道: > > does your training text include both half width and normal japanese? > > ShreeDevi > >

Re: [tesseract-ocr] train tesseract to improve the half-width Japanese(Katakana) recognition.

2017-11-08 Thread ShreeDevi Kumar
does your training text include both half width and normal japanese? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Nov 8, 2017 at 4:01 PM, Li Xianglei wrote: > Hi all, > > I'm trying to use tesseract to r

[tesseract-ocr] train tesseract to improve the half-width Japanese(Katakana) recognition.

2017-11-08 Thread Li Xianglei
Hi all, I'm trying to use tesseract to recognize Japanese on image. I found that it get a poor accuracy with the half-width Japanese(Katakana). I'am trying to improve the accuracy by fine-tuning , both [ Fine Tuning for ± a few characters] and [Training Just a Few

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Lstm training is not like legacy training. Please read the wiki pages regarding 4.0 training. I have given all sample commands there. There are 3 different ways of training. Read the bash scripts regarding training to know more. tesstrain.sh with --linedata-only creates the box tiff pairs but onl

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Sorry, I have given wrong commands for arabic. Actually i was referring to english. tesseract eng.arial.exp4.tif eng.arial.exp4 nobatch box.train unicharset_extractor eng.arial.exp4.box echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations about the font mftraining -F font_prop

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Arabic was never trained with the legacy tesseract engine and I doubt you will get any improvement over existing traineddata using cube or lstm. You are free to experiment and see what you come up with. I have pointed to the bash scripts for training. Please refer to them for the correct process.

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Hello shree, Thank you for your valuable reply.. Are there any changes i need to follow for the steps below.. I request you to suggest the changes for the below commands, these are for tess 3.0 tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train unicharset_extractor ara.arial.exp4.box

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
see https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh if ((LINEDATA)); then phase_E_extract_features "lstm.train" 8 "lstmf" make__lstmdata else phase_E_extract_features "box.train" 8 "tr" phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto" if [

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Can you please tell, whether the command -> tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train is right or not for tesseract 4. As it is producing .tr files when i give this command in tesseract 4. for image files training On Wednesday, April 12, 2017 at 2:19:24 PM UTC+5:30, shree

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread Ahmad Moawad
Thanks Shree for your reply I appreciate it, My intention: is that right path for training Tesseract 4.0 LSTM or not? On Wednesday, April 12, 2017 at 10:49:24 AM UTC+2, shree wrote: > > Read the bash scripts in > > tesstrain.sh > tesstrain_utils.sh > language_specific.sh > > In training directory

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Read the bash scripts in tesstrain.sh tesstrain_utils.sh language_specific.sh In training directory To understand more detail about lstm training - excuse the brevity, sent from mobile On 12-Apr-2017 10:47 AM, "Ahmad Moawad" wrote: > this is the part from https://github.com/tesseract-ocr/tes

[tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-11 Thread Ahmad Moawad
this is the part from https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 My question related to the image part not making training from text The overall training process is similar to training 3.04 Conc

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-05 Thread srnsp92
You can use *.* when identifying the files.. but you should be careful only image files are only supplied... as it can take all available files, because * means it takes input for all the files. 1)I request you can help me with posts i had posted today.. 2) And please guide how can i generate l

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread Saurabh Srivastav
thank you shree , you always help me. but i still have one problem that i wrote a bash script which trace the all images with .jpg extension and make their output files as the name of image. but i want that when i run script it trace more images with some different extensions like .jpg , .jpeg ,

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread ShreeDevi Kumar
Tesstrain.sh generates a file called eng.training_files.txt You are using command without .text extension Check the name of generated file and use that. I have found that editing that file also gives errors. - excuse the brevity, sent from mobile On 04-Apr-2017 7:01 PM, wrote: > I am trying t

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread srnsp92
I am trying to tesseract 4,, and i am getting folowing error,, command used: mkdir -p /home/p/Documents/T/engoutput /home/p/Documents/T/tesseract-master/training/lstmtraining -U /home/p/Documents/T/img_frm_3/unicharset \ --script_dir /home/p/Documents/T/TESS_4_ALPHA/langdata-master --debug_

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread ShreeDevi Kumar
See https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain_utils.sh https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.sh -- You received this message because you are

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread srnsp92
Hello ShreeDevi, can you elaborate regarding lstm step, which is new in Tesseract 4.0, and the new steps I need to follow for training Tesseract 4? Thank you On Monday, April 3, 2017 at 8:11:33 PM UTC+5:30, shree wrote: > > Saurabh, > > It depends on what you want to do with the bash script.

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread srnsp92
Hello ShreeDevi, https://medium.com/apegroup-texts/training-tesseract-for-labels-receipts-and-such-690f452e8f79 In the link, we can see a full fledged tutorial of tesseract 3.0 version, of using it and training it. Can you please clarify the below points...? https://github.com/tesseract-ocr/tes

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-03 Thread Saurabh Srivastav
shree, actually i want a bash script which run tesseract and store ouput file in a folder.. kindly help me to make this type of bash script. thank you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-03 Thread ShreeDevi Kumar
Saurabh, It depends on what you want to do with the bash script. Here is a sample of a script I used to compare results using diff tessdata files by looping thru a set of image files. Google the bash commands to figure out what they do! #!/bin/bash set -vx export TESSDATA_PREFIX=/mnt/c/Users/Use

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-03 Thread Saurabh Srivastav
hello shree ! thank you for your help. may you please help me how can i write a bash script for tesseract. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tes

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-03-23 Thread ShreeDevi Kumar
To read characters from an image, it is not necessary to train it. Just use an appropriate traineddata. Training is required only if it is a new language or font or some such special circumstance. Read the wiki for documentation. https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usag

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-03-22 Thread Saurabh Srivastav
Thank you shree for your valuable reply. But now i have created box files for a particuler image and trained it..but still i am missing something, may you please help me what i have to do after creating box file for that image and make tesseract to read the characters from that image. thanks an

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-03-02 Thread ShreeDevi Kumar
screenshot of warning means that your image does not have resolution info. Your OCR output file should have been created. Training 4.0 is not easy. Please see https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM ShreeDevi भजन

[tesseract-ocr] Train tesseract for recognition of a dotted font

2016-06-16 Thread Junmock Lee
Dear all, I'm trying to train tesseract for recognition of a dotted font such as this image. Here is my tif/box file pair that is generated by jT

Re: [tesseract-ocr] Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-10-05 Thread Tom Morris
I think Dmitri's suggest to start simple is a good one, but, if you need it, don't forget that you've got a lot of other information that can be leveraged to help. The notes all have a fixed aspect ratio (and size?). They've got a relatively standard layout. The denomination is encoded multi

Re: [tesseract-ocr] Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-10-02 Thread Dmitri Silaev
Hi Juan Pablo, Here are my thoughts about how I'd go with the initial version of image processor. I'm not sure OpenCV is the best tool for doing all this and you're free to choose how to implement this. As far as I understand the idea of your app, bills can be captured in an arbitrary manner, but

Re: [tesseract-ocr] Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-09-27 Thread Juan Pablo Aveggio
Hi Dmitri Silaev Thanks for your useful help. Actually I have almost no progress, in terms of image preprocessing. Just convert the image to grayscale before applying OCR. But I could not get good training data. The test code is as follows: #include #include #include #include #include usi

Re: [tesseract-ocr] Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-09-26 Thread Dmitri Silaev
Hi Juan Pablo, The problem cannot be solved by Tesseract as is. Even given such perfect images like you've shown, Tesseract would fail since your "characters" are too disjointed, have no meaningful baseline and only happen as singletons. However a simple and robust recognition can be implemented

Re: [tesseract-ocr] Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-09-25 Thread Juan Pablo Aveggio
Hi Dmitri Silaev. Thanks for reply. They are bills, sorry for mistranslation. You can see examples: 2 5 10 20 50

Re: [tesseract-ocr] Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-09-23 Thread Dmitri Silaev
Hi Juan Pablo, The problem seems interesting. However not sure if you can use Tesseract for that. Could you show one or more example tickets? Best regards, Dmitri Silaev www.CustomOCR.com On Tue, Sep 22, 2015 at 2:17 AM, Juan Pablo Aveggio wrote: > Hello > I'm trying to train tesseract for

[tesseract-ocr] Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-09-22 Thread Juan Pablo Aveggio
Hello I'm trying to train tesseract for recognition of patterns present in tickets. Each ticket possesses a unique pattern in a predetermined place which determines its value. As these patterns are not including unicode characters, I assigned them the characters 'a' to 'f'. I created a .tif ima

Re: [tesseract-ocr] Train tesseract for 14-segment display

2015-07-08 Thread Pierre-Henri DAUVERGNE
whether it is the right hammer in this > case. > > > > art > > --- > > 1. https://code.google.com/p/tesseract-ocr/wiki/ControlParams > > > > *From:* tesser...@googlegroups.com [mailto: > tesser...@googlegroups.com ] *On Behalf Of *Pierre-Henri > DAUVE

RE: [tesseract-ocr] Train tesseract for 14-segment display

2015-07-08 Thread Art Rhyno .
[mailto:tesseract-ocr@googlegroups.com] On Behalf Of Pierre-Henri DAUVERGNE Sent: Wednesday, July 08, 2015 5:26 AM To: tesseract-ocr@googlegroups.com Subject: Re: [tesseract-ocr] Train tesseract for 14-segment display I also tried different size and I have been able to make it work with any. Regardin

Re: [tesseract-ocr] Train tesseract for 14-segment display

2015-07-08 Thread Pierre-Henri DAUVERGNE
> > > > art > > --- > > 1. http://blog.damiles.com/2008/11/basic-ocr-in-opencv/ > > > > *From:* tesser...@googlegroups.com [mailto: > tesser...@googlegroups.com ] *On Behalf Of *Pierre-Henri > DAUVERGNE > *Sent:* Tuesday, July 07, 2015 8:41 AM >

RE: [tesseract-ocr] Train tesseract for 14-segment display

2015-07-07 Thread Art Rhyno .
Behalf Of Pierre-Henri DAUVERGNE Sent: Tuesday, July 07, 2015 8:41 AM To: tesseract-ocr@googlegroups.com Subject: Re: [tesseract-ocr] Train tesseract for 14-segment display I actually can't show you all the characters but I can give you a sample. I have the 10 digits and all letters. I tri

Re: [tesseract-ocr] Train tesseract for 14-segment display

2015-07-07 Thread Pierre-Henri DAUVERGNE
xp0.box” that are > producing the “Empty page!!” message? > > > > art > > > > *From:* tesser...@googlegroups.com [mailto: > tesser...@googlegroups.com ] *On Behalf Of *Pierre-Henri > DAUVERGNE > *Sent:* Tuesday, July 07, 2015 3:26 AM > *To:* tesser...@goog

RE: [tesseract-ocr] Train tesseract for 14-segment display

2015-07-07 Thread Art Rhyno .
@googlegroups.com Subject: Re: [tesseract-ocr] Train tesseract for 14-segment display Acutally I followed this guide<http://blog.ayoungprogrammer.com/2013/01/equation-ocr-part-2-training-characters.html> which is essentially the same as the one you gave me. I am doing all that. I use qt-box-edi

Re: [tesseract-ocr] Train tesseract for 14-segment display

2015-07-07 Thread Pierre-Henri DAUVERGNE
the one > character. > > > > art > > --- > > 1. > http://michaeljaylissner.com/blog/adding-new-fonts-to-tesseract-3-ocr-engine > > > > *From:* tesser...@googlegroups.com [mailto: > tesser...@googlegroups.com ] *On Behalf Of *Pierre-Henri > DAUV

RE: [tesseract-ocr] Train tesseract for 14-segment display

2015-07-06 Thread Art Rhyno .
@googlegroups.com Subject: Re: [tesseract-ocr] Train tesseract for 14-segment display Ok so I just tried after resizing my image by 2 and by 4 and it still doesn't work : tesseract says "Empty page!!". However, if I manually link the segments (with the brush tool in Gimp, see here : htt

Re: [tesseract-ocr] Train tesseract for 14-segment display

2015-07-06 Thread Pierre-Henri DAUVERGNE
* tesser...@googlegroups.com [mailto:tesser...@googlegroups.com] *On >> Behalf Of *Pierre-Henri DAUVERGNE >> *Sent:* Friday, July 03, 2015 10:20 AM >> *To:* tesser...@googlegroups.com >> *Subject:* [tesseract-ocr] Train tesseract for 14-segment display >> >> >> >>

Re: [tesseract-ocr] Train tesseract for 14-segment display

2015-07-06 Thread Pierre-Henri DAUVERGNE
> > > > art > > > > *From:* tesser...@googlegroups.com [mailto: > tesser...@googlegroups.com ] *On Behalf Of *Pierre-Henri > DAUVERGNE > *Sent:* Friday, July 03, 2015 10:20 AM > *To:* tesser...@googlegroups.com > *Subject:* [tesseract-ocr] Train tesseract for

[tesseract-ocr] Train tesseract for 14-segment display

2015-07-03 Thread Pierre-Henri DAUVERGNE
Hello everyone. I've posted on stackoverflow already but haven't had an answer yet (http://stackoverflow.com/questions/31131796/14-segment-display-and-tesseract-ocr-with-opencv). I'm looking for a way to accurately OCR 14-segment display. As you can see in my SO thread, I trained tesseract with

Re: [tesseract-ocr] Train Tesseract to Only Find a Single 17 Character Word

2014-11-11 Thread ShreeDevi Kumar
also see https://groups.google.com/forum/#!topic/tesseract-ocr/et7bS5QRf2o ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Nov 11, 2014 at 11:02 PM, ShreeDevi Kumar wrote: > Have you tested with the English tra

Re: [tesseract-ocr] Train Tesseract to Only Find a Single 17 Character Word

2014-11-11 Thread ShreeDevi Kumar
Have you tested with the English traineddata from the git tessdata repo? Please see https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html try with these, /path/to/eng.user-patterns: 1-\d\d\d-GOOG-411 www.\n\\\*.com I haven't tried this personally though ShreeDevi ___

[tesseract-ocr] Train Tesseract to Only Find a Single 17 Character Word

2014-11-11 Thread steven
I am working on getting Tesseract to recognize VINs for an application I am developing. I have a clean VIN image (work around to be black text on white background). Have traineddata using fonts Courier, HelveticaNeue, LatoBold, LatoLight, OpenSans, and RobotoSlab as a first attempt. I've also