date:20180917

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Jeff Breidenbach

Tesseract produces searchable PDF directly. If you really want to use HOCR as an intermediate format, you can but you will need external software. There are a couple of "hocr2pdf" programs floating around and "OCRMyPDF" does an admirable job tying things together. That said, going direct should g

Re: [tesseract-ocr] combine_lang_model makes no dawg file

2018-09-17 Thread Shree Devi Kumar

I use it as follows and it works. Please check that you are using correct paths for the files. combine_lang_model \ --input_unicharset ./layersan/san.unicharset \ --script_dir ~/langdata \ --words ~/langdata/san/san.wordlist \ --numbers ~/langdata/san/san.numbers \ --puncs ~/langdata/san/san.punc

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Shree Devi Kumar

I think pdf creation adds a text layer only and there isn't an option to add HOCR to it. @jbreiden can confirm. On Mon, Sep 17, 2018 at 6:10 PM, Monica wrote: > I have tried this, but this is showing the default behaviour. I think the > default output is overlaying on pdf instead of hocr out. >

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Monica

I have tried this, but this is showing the default behaviour. I think the default output is overlaying on pdf instead of hocr out. On Mon, Sep 17, 2018 at 5:47 PM Monica wrote: > Thanks Zdenko for you response. > will "tesseract scannedFile.png scanned.pdf -l eng hocr pdf" overlay on > pdf file

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Monica

Thanks Zdenko for you response. will "tesseract scannedFile.png scanned.pdf -l eng hocr pdf" overlay on pdf file ? On Mon, Sep 17, 2018 at 5:44 PM Zdenko Podobny wrote: > Something like this? > > tesseract scannedFile.png scanned.pdf -l eng hocr pdf > > Zdenko > > > po 17. 9. 2018 o 14:12 monica

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Zdenko Podobny

Something like this? tesseract scannedFile.png scanned.pdf -l eng hocr pdf Zdenko po 17. 9. 2018 o 14:12 monica kumari napísal(a): > for OCRing a scanned pdf, > first it is converted to image format then OCRed and gives a temperory > file of pdf/text format and overlays on original scanned pd

[tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread monica kumari

for OCRing a scanned pdf, first it is converted to image format then OCRed and gives a temperory file of pdf/text format and overlays on original scanned pdf. I want the output format to be hocr. for this, I ran the command "convert scannedFile.pdf scannedFile.png" and then "tesseract scannedFi

[tesseract-ocr] combine_lang_model makes no dawg file

2018-09-17 Thread Hosein Khoshdel

i used combine_lang_model like this: combine_lang_model--input_unicharset ../combinelangmodel/fas.lstm-unicharset \ --script_dir../combinelangmodel/sdir \ --outputdiroutputdir \ --langfas \ --lang_is_rtltrue \ --words..\lists\fas.wordlist \ --puncs..\lists\fa

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

Re: [tesseract-ocr] combine_lang_model makes no dawg file

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

[tesseract-ocr] How to overlay hocr output on original scanned pdf.

[tesseract-ocr] combine_lang_model makes no dawg file

8 matches

Site Navigation

Mail list logo

Footer information