[tesseract-ocr] OCR of free hand photo of book

2024-01-30 Thread Borneq
First I test tesseract on file generated as flat image. I generate Lorem Ipsum text: 5 paragraphs, 452 words 2978 bytes, 24 lines + 4 blank lines, maximal line len in my editor was 135 chars. Result: 100% accurate but two full stop marks, fantastic. Next, I rotate image. Only 0.7 degree caused

Re: [tesseract-ocr] OCR of free hand photo of book

2024-02-01 Thread Borneq
I have Linux and prefer batch. I found https://gist.github.com/endolith/334196bac1cac45a4893 (from https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#examples). It correct recognizes 1.00 degree. How it combine with tesseract? -- You received this message because you are subscribed to

Re: [tesseract-ocr] OCR of free hand photo of book

2024-02-01 Thread Borneq
I found mzucker/page_dewarp on github - tool for dewarp books and convert color to black and white -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc

[tesseract-ocr] The position of the text in the image

2024-02-01 Thread Borneq
I wanna: - scan page to image - generate OCR tekst from image - generate pdf page with image and tekst I must know where in image are specified words Is any option of tesseract to give such information? How to use tesseract ads library instead of command line? -- You received this message becaus

Re: Method of tesseract ?

2014-02-10 Thread Borneq
I attach to this thread. I new in tesseract. What is the tesseract main algorithm? -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send emai

Tesseract online is a lot more accurate

2014-03-12 Thread Borneq
I install Tesseract-OCR. If I process eurotext.tif it will be ok, but if I try recognize mean quality text - fro example first from https://www.google.com/recaptcha/digitizing, a get results: Th: llxnuuundge ma Lane nmomc: Elvin; men courage Al Inc ncenz uslem nd Vic:-s, In ur- whereas when I giv

Re: Tesseract online is a lot more accurate

2014-03-12 Thread Borneq
Big accuracy improvement is when I resize to 500% the same image. -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-oc

Which algorithms uses Tesseract?

2014-03-12 Thread Borneq
Is any place where are described algorithms which Tesseract uses? It is too much code to analyse. -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this g

Re: Tesseract online is a lot more accurate

2014-03-13 Thread Borneq
Maybe preprocessing is important. The resizing increased a lot accuracy. Probably converting to binary image breaks small letters W dniu środa, 12 marca 2014 22:05:45 UTC+1 użytkownik zdenop napisał: > > Let's summarize it: > >- You used tesseract >- http://www.free-ocr.com/ uses tesser

[tesseract-ocr] Recognizing narrowing

2014-12-31 Thread Borneq
I do not write stricte OCR application, but recognize images for medicine. I get after Canny edge detection this images:http://i.imgur.com/HIJQupz.png and http://i.imgur.com/UNaUZZ9.png How distinguish this images with eight-like or small g-like narrowing from simple ellipses and lines? I need OC

[tesseract-ocr] How to recognize kinds of letters and digits?

2016-12-18 Thread Borneq
void TextWindow::recognize(const char *imagepath) { Pix* pixs = pixRead(imagepath); if (!pixs) { fprintf(stderr, "Cannot open input file: %s\n", imagepath); exit(2); } tesseract::TessBaseAPI api; const char* lang = "pol"; const char* datapath = "/usr/shar