That scan is quite low resolution so it is hard to say how well any OCR
will work. I'd expect better than garbage, but a lot of errors.

The DPI is quite significant for checking whether a group of pixels is
noise or a glyph. It implies the minimum font size. 72 or 96 is a good
guess for screenshots (or 200 for a retina screen).

One possibility is that ocrmypdf fails to encode Cyrillic under the current
settings and available system fonts. If you have problems with all Cyrillic
images (even high quality scans), you could try adding the
--pdf-renderer=tesseract --output-type=pdf . That seems to work better for
non-Latin languages.

If you want to install the latest version instead of the Ubuntu version,
you could use the --sidecar argument to see what text is being found to
discern if the issue is PDF encoding or the image itself.

Aside: The "just print" feature would not have been helpful here even if it
worked.



On Sun, 4 Jun 2017 at 05:11 david braun <1687...@bugs.launchpad.net>
wrote:

> Sorry for the delay.
> I'm trying to translate the text in the attached to english. I have loaded
> the tesseract RUS language and executing
> $ ocrmypdf -l rus --image-dpi 64 111684498_large_2.jpg
> 111684498_large_2.pdf
> completes with the following messages
>    INFO - Input file is not a PDF, checking if it is an image...
>    INFO - Input file is an image
>    INFO - Input image has no ICC profile, assuming sRGB
>    INFO - Image seems valid. Try converting to PDF...
>    INFO - Successfully converted to PDF, processing...
> WARNING -    1: [tesseract] unsure about page orientation
>    INFO - Output file is a PDF/A-2B (as expected)
> But Google translate produces garbage.
> I was hoping to see what was being done by ocrmypdf to see if I could
> figure out what might be the cause.
>
> BTW - I chose the DPI randomly - how significant is this parameter?
>
>
> On Fri, May 26, 2017 at 12:51 AM, James R Barlow <
> 1687...@bugs.launchpad.net
> > wrote:
>
> > The code makes decisions at runtime based on the input file, so an
> argument
> > to skip executing all intermediates doesn't give an accurate picture of
> > what will happen. There is a --flowchart argument that produces a SVG
> file
> > showing the processing path which helps development a lot, but it's
> > probably not helpful to anyone else.
> >
> > What sort of use did you have for it?
> > On Thu, May 25, 2017 at 17:56 david braun <1687...@bugs.launchpad.net>
> > wrote:
> >
> > > ​
> > > ​That's unfortunate!​ Any reason why you removed the options?
> > >
> > > --
> > > You received this bug notification because you are subscribed to
> Ubuntu.
> > > https://bugs.launchpad.net/bugs/1687308
> > >
> > > Title:
> > >   ocrmypdf program and man page disagree about options
> > >
> > > Status in ocrmypdf package in Ubuntu:
> > >   Incomplete
> > >
> > > Bug description:
> > >   The man page for ocrmypdf claimes there is a "--just-print" option
> but
> > >   the program rejects this. Also the man page claims the "-n" does the
> > >   same. It doesn't. The option is accepted but nothing obvious happens.
> > >
> > >   ProblemType: Bug
> > >   DistroRelease: Ubuntu 17.04
> > >   Package: ocrmypdf 4.3.5-2
> > >   ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8
> > >   Uname: Linux 4.10.0-20-generic x86_64
> > >   ApportVersion: 2.20.4-0ubuntu4
> > >   Architecture: amd64
> > >   CurrentDesktop: Unity:Unity7
> > >   Date: Sun Apr 30 13:55:46 2017
> > >   EcryptfsInUse: Yes
> > >   InstallationDate: Installed on 2015-05-31 (699 days ago)
> > >   InstallationMedia: Ubuntu 14.04.2 LTS "Trusty Tahr" - Release amd64
> > > (20150218.1)
> > >   PackageArchitecture: all
> > >   ProcEnviron:
> > >    LANGUAGE=en_US
> > >    PATH=(custom, no user)
> > >    XDG_RUNTIME_DIR=<set>
> > >    LANG=en_US.UTF-8
> > >    SHELL=/bin/bash
> > >   SourcePackage: ocrmypdf
> > >   UpgradeStatus: Upgraded to zesty on 2017-04-28 (1 days ago)
> > >
> > > To manage notifications about this bug go to:
> > >
> > > https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/
> > 1687308/+subscriptions
> > >
> > >
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1687308
> >
> > Title:
> >   ocrmypdf program and man page disagree about options
> >
> > Status in ocrmypdf package in Ubuntu:
> >   Incomplete
> >
> > Bug description:
> >   The man page for ocrmypdf claimes there is a "--just-print" option but
> >   the program rejects this. Also the man page claims the "-n" does the
> >   same. It doesn't. The option is accepted but nothing obvious happens.
> >
> >   ProblemType: Bug
> >   DistroRelease: Ubuntu 17.04
> >   Package: ocrmypdf 4.3.5-2
> >   ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8
> >   Uname: Linux 4.10.0-20-generic x86_64
> >   ApportVersion: 2.20.4-0ubuntu4
> >   Architecture: amd64
> >   CurrentDesktop: Unity:Unity7
> >   Date: Sun Apr 30 13:55:46 2017
> >   EcryptfsInUse: Yes
> >   InstallationDate: Installed on 2015-05-31 (699 days ago)
> >   InstallationMedia: Ubuntu 14.04.2 LTS "Trusty Tahr" - Release amd64
> > (20150218.1)
> >   PackageArchitecture: all
> >   ProcEnviron:
> >    LANGUAGE=en_US
> >    PATH=(custom, no user)
> >    XDG_RUNTIME_DIR=<set>
> >    LANG=en_US.UTF-8
> >    SHELL=/bin/bash
> >   SourcePackage: ocrmypdf
> >   UpgradeStatus: Upgraded to zesty on 2017-04-28 (1 days ago)
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/
> > 1687308/+subscriptions
> >
>
>
> ** Attachment added: "111684498_large_2.jpg"
>
> https://bugs.launchpad.net/bugs/1687308/+attachment/4888804/+files/111684498_large_2.jpg
>
> ** Attachment added: "111684498_large_2.pdf"
>
> https://bugs.launchpad.net/bugs/1687308/+attachment/4888805/+files/111684498_large_2.pdf
>
> --
> You received this bug notification because you are subscribed to Ubuntu.
> https://bugs.launchpad.net/bugs/1687308
>
> Title:
>   ocrmypdf program and man page disagree about options
>
> Status in ocrmypdf package in Ubuntu:
>   Incomplete
>
> Bug description:
>   The man page for ocrmypdf claimes there is a "--just-print" option but
>   the program rejects this. Also the man page claims the "-n" does the
>   same. It doesn't. The option is accepted but nothing obvious happens.
>
>   ProblemType: Bug
>   DistroRelease: Ubuntu 17.04
>   Package: ocrmypdf 4.3.5-2
>   ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8
>   Uname: Linux 4.10.0-20-generic x86_64
>   ApportVersion: 2.20.4-0ubuntu4
>   Architecture: amd64
>   CurrentDesktop: Unity:Unity7
>   Date: Sun Apr 30 13:55:46 2017
>   EcryptfsInUse: Yes
>   InstallationDate: Installed on 2015-05-31 (699 days ago)
>   InstallationMedia: Ubuntu 14.04.2 LTS "Trusty Tahr" - Release amd64
> (20150218.1)
>   PackageArchitecture: all
>   ProcEnviron:
>    LANGUAGE=en_US
>    PATH=(custom, no user)
>    XDG_RUNTIME_DIR=<set>
>    LANG=en_US.UTF-8
>    SHELL=/bin/bash
>   SourcePackage: ocrmypdf
>   UpgradeStatus: Upgraded to zesty on 2017-04-28 (1 days ago)
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions
>
>

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1687308

Title:
  ocrmypdf program and man page disagree about options

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to