On Thursday, November 30, 2023 at 11:11:08 AM UTC-5 scott...@gmail.com wrote:
I'm running an image through Tesseract via a PHP library ( https://github.com/thiagoalessio/tesseract-ocr-for-php). There's a bunch of really useful information missing (e.g. version of Tesseract), but fortunately this is easily reproducible with the current development version. The ouput seems to contain two potential matches for a single character. That's not what's happening. It's actually recognizing both characters separately, although I'm not sure why. The engine does consider the correct string, but the incorrect string scores higher. I'm not familiar enough with the internals to interpret it, but the debug output is below in case someone else wants to give it a go. As you can see it considers both P01.01 and P0O1.01, but picks the latter because it's got a (marginally) better score. Tom Processing word with lang eng at:Bounding box=(21,46)->(143,75) Trying word using lang eng, oem 1 Created window Convolve of size 530, 1700 Created window ConvNL of size 530, 2000 Created window Lfys64 of size 530, 2000 Created window Lfx96 of size 530, 1534 Created window Lrx96 of size 530, 1534 Created window Lfx512 of size 530, 2000 Created window Output of size 530, 1761 Created window LSTMForward of size 1418, 580 <null>=110 On [0, 2), scores= 100(i=83=0.00155) 100(P=28=0.00331), Mean=99.9928, max=99.9963 P=28 On [2, 8), scores= 26.8(<null>=110=69) 97.1(p=103=2.5) 90.3(<null>=110=9.35) 3.18(<null>=110=96.8) 4.89e-05(<null>=110=100) 6.7e-05(<null>=110=92.2), Mean=36.2217, max=97.0573 0=33 On [8, 9), scores= 67(O=21=31.2), Mean=67.0461, max=67.0461 O=21 On [9, 13), scores= 56.8(0=33=42.5) 10.5(<null>=110=82) 1.01e-05(<null>=110=100) 7.72e-06(<null>=110=100), Mean=16.8307, max=56.8101 1=34 On [13, 18), scores= 64.2(<null>=110=34.6) 99.4(l=87=0.559) 98.5(<null>=110=1.46) 5.78(<null>=110=94.2) 1.12e-05(<null>=110=100), Mean=53.576, max=99.3538 .=23 On [18, 23), scores= 83.1(<null>=110=16.9) 100(,=15=0.00592) 94.8(<null>=110=5.24) 0.0434(<null>=110=100) 2.41e-07(<null>=110=100), Mean=55.5837, max=99.9923 0=33 On [23, 28), scores= 52.6(<null>=110=47.4) 99.8(O=21=0.124) 82(<null>=110=17.9) 0.000255(<null>=110=100) 3.13e-11(<null>=110=100), Mean=46.8875, max=99.8451 1=34 On [28, 33), scores= 46.1(<null>=110=53.8) 99.8(l=87=0.0986) 99.1(<null>=110=0.894) 0.586(<null>=110=99.4) 8.5e-09(<null>=110=100), Mean=49.1089, max=99.8028 0 null_char score=-0.191493, c=-0.191493, perm=2, hash=0 1 null_char score=-0.382826, c=-0.191333, perm=2, hash=0 prev:null_char score=-0.191493, c=-0.191493, perm=2, hash=0 2 label=28, uid=30=P [50 ]A score=-0.66966, c=-0.286834, perm=2, hash=1c prev:null_char score=-0.382826, c=-0.191333, perm=2, hash=0 3 label=28, uid=30=P [50 ]A score=-0.928113, c=-0.258453, perm=2, hash=1c prev:label=28, uid=30=P [50 ]A score=-0.66966, c=-0.286834, perm=2, hash=1c 4 label=28, uid=30=P [50 ]A score=-1.12845, c=-0.200338, perm=2, hash=1c prev:label=28, uid=30=P [50 ]A score=-0.928113, c=-0.258453, perm=2, hash=1c 5 null_char score=-1.39284, c=-0.264391, perm=2, hash=1c prev:label=28, uid=30=P [50 ]A score=-1.12845, c=-0.200338, perm=2, hash=1c 6 null_char score=-1.58412, c=-0.191278, perm=2, hash=1c prev:null_char score=-1.39284, c=-0.264391, perm=2, hash=1c 7 null_char score=-1.95755, c=-0.373434, perm=2, hash=1c prev:null_char score=-1.58412, c=-0.191278, perm=2, hash=1c 8 label=33, uid=35=0 [30 ]0 score=-3.04833, c=-1.09078, perm=2, hash=c45 prev:null_char score=-1.95755, c=-0.373434, perm=2, hash=1c 9 label=21, uid=23=O [4f ]A score=-4.51186, c=-1.46353, perm=2, hash=55200 prev:label=33, uid=35=0 [30 ]0 score=-3.04833, c=-1.09078, perm=2, hash=c45 10 label=21, uid=23=O [4f ]A score=-4.87753, c=-0.365671, perm=2, hash=55200 prev:label=21, uid=23=O [4f ]A score=-4.51186, c=-1.46353, perm=2, hash=55200 11 null_char score=-5.06878, c=-0.191256, perm=2, hash=55200 prev:label=21, uid=23=O [4f ]A score=-4.87753, c=-0.365671, perm=2, hash=55200 12 null_char score=-5.26093, c=-0.192142, perm=2, hash=55200 prev:null_char score=-5.06878, c=-0.191256, perm=2, hash=55200 13 label=34, uid=36=1 [31 ]0 score=-5.47957, c=-0.218643, perm=2, hash=24e8e22 prev:null_char score=-5.26093, c=-0.192142, perm=2, hash=55200 14 label=34, uid=36=1 [31 ]0 score=-5.68541, c=-0.205837, perm=2, hash=24e8e22 prev:label=34, uid=36=1 [31 ]0 score=-5.47957, c=-0.218643, perm=2, hash=24e8e22 15 label=34, uid=36=1 [31 ]0 score=-5.91023, c=-0.224826, perm=2, hash=24e8e22 prev:label=34, uid=36=1 [31 ]0 score=-5.68541, c=-0.205837, perm=2, hash=24e8e22 16 label=34, uid=36=1 [31 ]0 score=-6.10178, c=-0.191543, perm=2, hash=24e8e22 prev:label=34, uid=36=1 [31 ]0 score=-5.91023, c=-0.224826, perm=2, hash=24e8e22 17 null_char score=-6.29319, c=-0.191418, perm=2, hash=24e8e22 prev:label=34, uid=36=1 [31 ]0 score=-6.10178, c=-0.191543, perm=2, hash=24e8e22 18 label=23, uid=25=. [2e ]p score=-6.48452, c=-0.191326, perm=2, hash=1000fa0d5 prev:null_char score=-6.29319, c=-0.191418, perm=2, hash=24e8e22 19 label=23, uid=25=. [2e ]p score=-6.67594, c=-0.191424, perm=2, hash=1000fa0d5 prev:label=23, uid=25=. [2e ]p score=-6.48452, c=-0.191326, perm=2, hash=1000fa0d5 20 label=23, uid=25=. [2e ]p score=-6.8672, c=-0.191259, perm=2, hash=1000fa0d5 prev:label=23, uid=25=. [2e ]p score=-6.67594, c=-0.191424, perm=2, hash=1000fa0d5 21 null_char score=-7.05943, c=-0.192229, perm=2, hash=1000fa0d5 prev:label=23, uid=25=. [2e ]p score=-6.8672, c=-0.191259, perm=2, hash=1000fa0d5 22 null_char score=-7.2507, c=-0.191266, perm=2, hash=1000fa0d5 prev:null_char score=-7.05943, c=-0.192229, perm=2, hash=1000fa0d5 23 label=33, uid=35=0 [30 ]0 score=-7.44305, c=-0.192357, perm=2, hash=6f06c6bc7c prev:null_char score=-7.2507, c=-0.191266, perm=2, hash=1000fa0d5 24 label=33, uid=35=0 [30 ]0 score=-7.63779, c=-0.194738, perm=2, hash=6f06c6bc7c prev:label=33, uid=35=0 [30 ]0 score=-7.44305, c=-0.192357, perm=2, hash=6f06c6bc7c 25 label=33, uid=35=0 [30 ]0 score=-7.83177, c=-0.193978, perm=2, hash=6f06c6bc7c prev:label=33, uid=35=0 [30 ]0 score=-7.63779, c=-0.194738, perm=2, hash=6f06c6bc7c 26 null_char score=-8.02303, c=-0.19126, perm=2, hash=6f06c6bc7c prev:label=33, uid=35=0 [30 ]0 score=-7.83177, c=-0.193978, perm=2, hash=6f06c6bc7c 27 null_char score=-8.21431, c=-0.191279, perm=2, hash=6f06c6bc7c prev:null_char score=-8.02303, c=-0.19126, perm=2, hash=6f06c6bc7c 28 label=34, uid=36=1 [31 ]0 score=-8.40869, c=-0.194379, perm=2, hash=3023f02bb9e6 prev:null_char score=-8.21431, c=-0.191279, perm=2, hash=6f06c6bc7c 29 label=34, uid=36=1 [31 ]0 score=-8.60438, c=-0.195692, perm=2, hash=3023f02bb9e6 prev:label=34, uid=36=1 [31 ]0 score=-8.40869, c=-0.194379, perm=2, hash=3023f02bb9e6 30 label=34, uid=36=1 [31 ]0 score=-8.79638, c=-0.192, perm=2, hash=3023f02bb9e6 prev:label=34, uid=36=1 [31 ]0 score=-8.60438, c=-0.195692, perm=2, hash=3023f02bb9e6 31 null_char score=-9.00085, c=-0.204469, perm=2, hash=3023f02bb9e6 prev:label=34, uid=36=1 [31 ]0 score=-8.79638, c=-0.192, perm=2, hash=3023f02bb9e6 32 null_char score=-9.1921, c=-0.191251, perm=2, hash=3023f02bb9e6 prev:null_char score=-9.00085, c=-0.204469, perm=2, hash=3023f02bb9e6 Second choice path: 2 30=P [50 ]A r=1.12845, c=-0.286834, s=0, e=0, perm=2 8 35=0 [30 ]0 r=3.98936, c=-2.11872, s=0, e=0, perm=2 13 36=1 [31 ]0 r=1.86124, c=-0.636992, s=0, e=0, perm=2 18 25=. [2e ]p r=0.765426, c=-0.191424, s=0, e=0, perm=2 23 35=0 [30 ]0 r=0.964568, c=-0.194738, s=0, e=0, perm=2 28 36=1 [31 ]0 r=1.36033, c=-0.204469, s=0, e=0, perm=2 Path total rating = 10.0694 2 30=P [50 ]A r=1.12845, c=-0.286834, s=0, e=0, perm=2 8 35=0 [30 ]0 r=1.91988, c=-1.09078, s=0, e=0, perm=2 9 23=O [4f ]A r=1.8292, c=-1.46353, s=0, e=0, perm=2 13 36=1 [31 ]0 r=1.22425, c=-0.224826, s=0, e=0, perm=2 18 25=. [2e ]p r=0.765426, c=-0.191424, s=0, e=0, perm=2 23 35=0 [30 ]0 r=0.964568, c=-0.194738, s=0, e=0, perm=2 28 36=1 [31 ]0 r=1.36033, c=-0.204469, s=0, e=0, perm=2 Path total rating = 9.1921 Best choice: accepted=0, adaptable=0, done=1 : Lang result : P0O1.01 : R=9.1921, C=-10.2447, F=1, Perm=2, xht=[0,3.40282e+38], ambig=0 pos NORM NORM NORM NORM NORM NORM NORM str P 0 O 1 . 0 1 state: 1 1 1 1 1 1 1 C -0.287 -1.091 -1.464 -0.225 -0.191 -0.195 -0.204 1 new words better than 0 old words: r: 9.1921 v 0 c: -10.2447 v 0 valid dict: 0 v 0 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9f2d2046-2b24-473c-8ff5-d1325970b03en%40googlegroups.com.