[tesseract-ocr] Re: Tesseract API to output PDF with txt layer

2018-07-27 Thread Quan Nguyen
Yes, the PDF functionality is exposed in C-API interface, which Tess4J fully supports. On Friday, July 27, 2018 at 4:46:15 AM UTC-5, PSK wrote: > > I know that Tesseract v4 CLI is able

Re: [tesseract-ocr] Can't symlink into tessdata anymore?

2018-07-27 Thread Zdenko Podobny
If I got it right, that confirm that there is no problem/but related to symlink, but outdated itatraineddata. Right? Zdenko pi 27. 7. 2018 o 9:27 Shree Devi Kumar napĂ­sal(a): > @zdenko podobny > > Please see https://github.com/tesseract-ocr/tessdata/issues/18 > ita.special-words missing #18 >

[tesseract-ocr] Tesseract API to output PDF with txt layer

2018-07-27 Thread PSK
I know that Tesseract v4 CLI is able to produce the output as PDF with txt layer. The question is whether this functionality is also available via its API? If so, the other question is whether Tess4J will expose that API to Java, too (I know that this is a separate product, but maybe someone is

[tesseract-ocr] Page & block/para/line/word/character

2018-07-27 Thread u3536034
Can anyone explain the relation between page and block/para/line/word/character? What does page really represent in Tesseract? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it,

[tesseract-ocr] The applicability of psm(page segmentation modes) in tesseract 4.0.0

2018-07-27 Thread u3536034
Tesseract 4.0.0 switches to a lstm-based core. I'm wondering whether the page segmentation modes (listed when you type tesseract --help-extra) are applicable for the current engine. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscr

Re: [tesseract-ocr] Can't symlink into tessdata anymore?

2018-07-27 Thread Shree Devi Kumar
@zdenko podobny Please see https://github.com/tesseract-ocr/tessdata/issues/18 ita.special-words missing #18 On Fri, Jul 27, 2018 at 11:55 AM Zdenko Podobny wrote: > symlink is filesystem feature and tesseract use standard C++ function for > reading/writing files from filesystem, so there is n

[tesseract-ocr] Re: reading a negated document

2018-07-27 Thread Neha Mittal
Can you provide some more detail? What do you mean by "make the numbers easier to read but its not helping the OCR"? Possible to share snippet of your code? On Tuesday, July 24, 2018 at 12:22:29 AM UTC+5:30, grios wrote: > > I'm trying to process the attached image to pull the text elements out.