You have not mentioned which version of tesseract you are using. I tested just now with tesseract4.0alpha and the pdf has the original image with lines. See attached. However, as Zdenko had pointed out before, the OCR is NOT accurate.
ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Dec 11, 2017 at 9:57 AM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > Pdf generation is done by tesseract only. I had cc:ed Jeff who is the main > developer for the pdf related code. > > > > > On 10-Dec-2017 11:03 PM, "lelive" <o....@groupe-archibald.fr> wrote: > > Ok, thank for your reply ! > > If i use > tesseract img.tif out -l fra pdf > > which software makes the conversion to pdf ? > > Olivier > > > Le dimanche 10 décembre 2017 10:02:30 UTC+1, shree a écrit : > >> I think the question is related to pdf generation and not the actual OCR. >> >> The resulting pdf should include the original image with the text layer. >> It seems the lines are deleted in generated pdf. >> >> On 10-Dec-2017 1:25 PM, "lelive" <o....@groupe-archibald.fr> wrote: >> >>> Hello, >>> yes i know that, but i have the same problem with classic tables in A4 >>> page. All lines disapears ! >>> >>> Help plz ! >>> >>> Le jeudi 7 décembre 2017 10:05:15 UTC+1, zdenop a écrit : >>>> >>>> I do not think that images like this are appropriate for OCR (at least >>>> not for tesseract). IMO you should do preprocessing of them and pass to >>>> tesseract only areas with text. >>>> >>>> Tesseract is very noise sensitive (at least 3.x version). >>>> >>>> Zdenko >>>> >>>> On Wed, Dec 6, 2017 at 8:32 PM, lelive <o....@groupe-archibald.fr> >>>> wrote: >>>> >>>>> Hi all, >>>>> i use tesseract for technical documents and produce pdf searchable . >>>>> But if the picture contain lines, in the pdf file result, the lines are >>>>> deleted >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> <https://lh3.googleusercontent.com/-WjPOK7PSDWU/WihFUAPd3HI/AAAAAAAAAAM/p73chP6zlVYwOJPbqsNnSJzD99CNrMuBACLcBGAs/s1600/Capture%2Bdu%2B2017-12-06%2B20-29-46.png> >>>>> <https://lh3.googleusercontent.com/-4QLOB3yBVNY/WihFeR0s3qI/AAAAAAAAAAQ/jVqkpYsVKwk-NwiwPiTB9wjIX_ZZRO6_gCLcBGAs/s1600/Capture%2Bdu%2B2017-12-06%2B20-29-06.png> >>>>> >>>>> >>>>> >>>>> Is there a solution or parameter for say to tesseract do not "clean" >>>>> picture out ? >>>>> >>>>> Many thanks for your help ! >>>>> >>>>> Olivier >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/f8d3df29-c90 >>>>> 0-4172-a9ce-9892463f0634%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/f8d3df29-c900-4172-a9ce-9892463f0634%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To post to this group, send email to tesser...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/e9880930-1dc4-45b9-bf8b-982ce199e394%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/e9880930-1dc4-45b9-bf8b-982ce199e394%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ms > gid/tesseract-ocr/a33f7520-be6f-4206-9b1f-3e9cbb88b48e%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/a33f7520-be6f-4206-9b1f-3e9cbb88b48e%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVaAHSj3KS%3D6Xx7acyUZJ%2BZJsDkD4NP33J9uGhGGYEpjw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Capture-fra.pdf
Description: Adobe PDF document