Re: [tesseract-ocr] looking for URLs in screen shots

2019-05-15 Thread Carl Karsten
That's interesting... it's along the lines what I am hoping for, execpt only 1/2 the preentations have URLs in them, and half of them only 1, so anything manual is too much labor. On Wed, May 15, 2019 at 6:38 PM Lorenzo Bolzani wrote: > > > Hi, > if you are willing to program a little this is wha

Re: [tesseract-ocr] looking for URLs in screen shots

2019-05-15 Thread Lorenzo Bolzani
Hi, if you are willing to program a little this is what I would try: - opencv template matching : extract a few frame fragments containing "https://"; from the video then look for it in all frames (or maybe one frame out of).

[tesseract-ocr] looking for URLs in screen shots

2019-05-15 Thread Carl Karsten
I record tech conference presentations. using http://hdmi2usb.tv I get a perfect image of the the presentations, Often presenters will put a URL on the screen, like here: https://youtu.be/2aOsd9YVQjM?t=1583 OCRing all the text would be great, but I'd like to focus on URLs, and for extra credi

Re: [tesseract-ocr] Tesseract Multipage tiff to multipage pdf

2019-05-15 Thread Shree Devi Kumar
tesseract In\SPTest.tif Out\Test --psm 3 -l rus+eng pdf This should be enough to create a multi page pdf from a multi page tiff. On Wed, May 15, 2019 at 7:27 PM András Jeszenkovits wrote: > Here: tesseract In\SPTest.tif Out\Test --psm 3 -l rus+eng *-c > tessedit_page_number=-1* pdf > > 2019. m

[tesseract-ocr] What does --noextract_font_properties do?

2019-05-15 Thread Timothy Snyder
Hey all, quick question: What does --noextract_font_properties do when using tesstrain.sh? I've been using the flag for training since it's used in the training guide on GitHub. However, there I can't seem to find any usage information. tesstrain.sh doesn't seem to include it in its usage info:

Re: [tesseract-ocr] Tesseract Multipage tiff to multipage pdf

2019-05-15 Thread Zdenko Podobny
Please read my question once again. Zdenko st 15. 5. 2019 o 15:57 András Jeszenkovits napísal(a): > Here: tesseract In\SPTest.tif Out\Test --psm 3 -l rus+eng *-c > tessedit_page_number=-1* pdf > > 2019. május 15., szerda 15:51:31 UTC+2 időpontban zdenop a következőt írta: >> >> Why are you usi

Re: [tesseract-ocr] Tesseract Multipage tiff to multipage pdf

2019-05-15 Thread András Jeszenkovits
Here: tesseract In\SPTest.tif Out\Test --psm 3 -l rus+eng *-c tessedit_page_number=-1* pdf 2019. május 15., szerda 15:51:31 UTC+2 időpontban zdenop a következőt írta: > > Why are you using tessedit_page_number ? > > Zdenko > > > st 15. 5. 2019 o 15:43 András Jeszenkovits > napísal(a): > >> H

Re: [tesseract-ocr] Tesseract Multipage tiff to multipage pdf

2019-05-15 Thread Zdenko Podobny
Why are you using tessedit_page_number ? Zdenko st 15. 5. 2019 o 15:43 András Jeszenkovits napísal(a): > Hello! > > Can you help me with this problem? I'm testing the tesseract OCR engine. > The input is a scanned multipage TIFF file. I tried to create a PDF from > that, but the result is alw

[tesseract-ocr] Tesseract Multipage tiff to multipage pdf

2019-05-15 Thread András Jeszenkovits
Hello! Can you help me with this problem? I'm testing the tesseract OCR engine. The input is a scanned multipage TIFF file. I tried to create a PDF from that, but the result is always one page. I used this cmd line: tesseract In\Test.tif Out\TestOutput -l rus+eng -c tessedit_page_number=-1 pdf