As mentioned in the google code issue page a simple, but probably
incorrect, workaround can be used to deal with stuff that touches the
left edge of the page:
--- /usr/share/ocropus/scripts/lib/hocr.lua.orig 2010-04-27
22:05:29.000000000 +0200
+++ /usr/share/ocropus/scripts/lib/hocr.lua 2010-07-28 17:08:14.000000000
+0200
@@ -24,8 +24,8 @@
hocr = {}
function hocr.parse_rectangle(s, h)
- local x0, y0, x1, y1 = s:match('^%s*(%d*)%s*(%d*)%s*(%d*)%s*(%d*)%s*$')
- assert(x0 and y0 and x1 and y1, "rectangle parsing error")
+ local x0, y0, x1, y1 = s:match('^%s*(-?%d*)%s*(%d*)%s*(%d*)%s*(%d*)%s*$')
+ assert(x0 and y0 and x1 and y1, "rectangle parsing error " .. s)
return rectangle(x0+0, h-1-y1, x1+0, h-1-y0)
end
If ocroscript is being called from ocrodjvu this "fix" tickles a problem
in ocrodjvu:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/share/ocrodjvu/lib/_ocrodjvu.py", line 445, in page_thread
result = self.process_page(page)
File "/usr/share/ocrodjvu/lib/_ocrodjvu.py", line 425, in process_page
page_size=size
File "/usr/share/ocrodjvu/lib/hocr.py", line 462, in extract_text
scan_result = scan(doc.find('/body'), settings)
File "/usr/share/ocrodjvu/lib/hocr.py", line 428, in scan
_rotate(zone, settings.rotation)
File "/usr/share/ocrodjvu/lib/hocr.py", line 324, in _rotate
assert obj.bbox[:2] == (0, 0)
AssertionError
Another dodgy workaround for that is:
--- /usr/share/ocrodjvu/lib/hocr.py.orig 2010-07-28 17:20:17.000000000
+0200
+++ /usr/share/ocrodjvu/lib/hocr.py 2010-07-28 17:20:50.000000000 +0200
@@ -321,7 +321,7 @@
if xform is None:
assert isinstance(obj, Zone)
assert obj.type == const.TEXT_ZONE_PAGE
- assert obj.bbox[:2] == (0, 0)
+ # assert obj.bbox[:2] == (0, 0)
page_size = obj.bbox[2:]
if (rotation // 90) & 1:
xform = decode.AffineTransform((0, 0) + tuple(reversed(page_size)),
(0, 0) + page_size)
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]