You have not mentioned which version of tesseract you are using. I tested
just now with tesseract4.0alpha and the pdf has the original image with
lines. See attached.
However, as Zdenko had pointed out before, the OCR is NOT accurate.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Dec 11, 2017 at 9:57 AM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> Pdf generation is done by tesseract only. I had cc:ed Jeff who is the main
> developer for the pdf related code.
>
>
>
>
> On 10-Dec-2017 11:03 PM, "lelive" <o....@groupe-archibald.fr> wrote:
>
> Ok, thank for your reply !
>
> If i use
> tesseract img.tif out -l fra pdf
>
> which software makes the conversion to pdf ?
>
> Olivier
>
>
> Le dimanche 10 décembre 2017 10:02:30 UTC+1, shree a écrit :
>
>> I think the question is related to pdf generation and not the actual OCR.
>>
>> The resulting pdf should include the original image with the text layer.
>> It seems the lines are deleted in generated pdf.
>>
>> On 10-Dec-2017 1:25 PM, "lelive" <o....@groupe-archibald.fr> wrote:
>>
>>> Hello,
>>> yes i know that, but i have the same problem with classic tables in A4
>>> page. All lines disapears !
>>>
>>> Help plz !
>>>
>>> Le jeudi 7 décembre 2017 10:05:15 UTC+1, zdenop a écrit :
>>>>
>>>> I do not think that images like this are appropriate for OCR (at least
>>>> not for tesseract). IMO you should do preprocessing of them and pass to
>>>> tesseract only areas with text.
>>>>
>>>> Tesseract is very noise sensitive (at least 3.x version).
>>>>
>>>> Zdenko
>>>>
>>>> On Wed, Dec 6, 2017 at 8:32 PM, lelive <o....@groupe-archibald.fr>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>> i use tesseract for technical documents and produce pdf searchable .
>>>>> But if the picture contain lines, in the pdf file result, the lines are
>>>>> deleted
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> <https://lh3.googleusercontent.com/-WjPOK7PSDWU/WihFUAPd3HI/AAAAAAAAAAM/p73chP6zlVYwOJPbqsNnSJzD99CNrMuBACLcBGAs/s1600/Capture%2Bdu%2B2017-12-06%2B20-29-46.png>
>>>>> <https://lh3.googleusercontent.com/-4QLOB3yBVNY/WihFeR0s3qI/AAAAAAAAAAQ/jVqkpYsVKwk-NwiwPiTB9wjIX_ZZRO6_gCLcBGAs/s1600/Capture%2Bdu%2B2017-12-06%2B20-29-06.png>
>>>>>
>>>>>
>>>>>
>>>>> Is there a solution or parameter for say to tesseract do not "clean"
>>>>> picture out ?
>>>>>
>>>>> Many thanks for your help !
>>>>>
>>>>> Olivier
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/f8d3df29-c90
>>>>> 0-4172-a9ce-9892463f0634%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/f8d3df29-c900-4172-a9ce-9892463f0634%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/e9880930-1dc4-45b9-bf8b-982ce199e394%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/e9880930-1dc4-45b9-bf8b-982ce199e394%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/ms
> gid/tesseract-ocr/a33f7520-be6f-4206-9b1f-3e9cbb88b48e%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/a33f7520-be6f-4206-9b1f-3e9cbb88b48e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVaAHSj3KS%3D6Xx7acyUZJ%2BZJsDkD4NP33J9uGhGGYEpjw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Attachment: Capture-fra.pdf
Description: Adobe PDF document

Reply via email to