Invert the image. Results using tessdata_best/eng - LSTM engine $ tesseract legacy-invert.jpg - --psm 6 063.433 $ tesseract legacy-300.jpg - --psm 6 063.433 $ tesseract legacy-144.jpg - --psm 6 063.433
On Sun, Nov 1, 2020 at 8:37 PM Cailey McVay <cailey.m.mcvay...@dartmouth.edu> wrote: > Here is an example of the sample image. I believe we are using the legacy > engine. Does this help? > > On Saturday, October 31, 2020 at 11:15:46 PM UTC-4 shree wrote: > >> >When we use tesseract on the images without the trained language we >> receive outputs that are accurate about 50% of the time. >> >> You haven't shared a sample image. Sometimes preprocessing the images, >> using a whitelist in case of limited character set can be the solution >> rather than training. >> >> On Sun, Nov 1, 2020, 03:29 Cailey McVay <cailey.m...@dartmouth.edu> >> wrote: >> >>> Hello! >>> I am working on a project that is trying to read borehole video depths. >>> We trained a new language to read these numbers called NTS. When we use >>> tesseract on the images without the trained language we receive outputs >>> that are accurate about 50% of the time. However when we use the new >>> language, we receive no output at all. Is it possible that we overtrained >>> tesseract to not recognize any of the images? I will attach below our box >>> file, unicharset file, box trained file, pffmtable file, and normproto >>> file. Our shapetable file processes but then returns an empty file. Could >>> something be wrong with our shapetable? And if so, how could we fix that? >>> >>> Box File for the first five images: >>> 0 3 1 14 19 0 >>> 9 18 0 29 20 0 >>> 3 33 1 46 19 0 >>> . 50 1 56 19 0 >>> 2 64 1 75 19 0 >>> 5 76 1 93 19 0 >>> 2 92 1 111 19 0 >>> 0 4 1 15 19 1 >>> 8 19 1 30 19 1 >>> 3 34 1 46 19 1 >>> . 54 1 57 5 1 >>> 4 65 1 77 19 1 >>> 1 82 1 91 19 1 >>> 4 96 1 107 19 1 >>> 0 3 1 15 19 2 >>> 8 19 1 30 19 2 >>> 6 34 1 46 19 2 >>> . 53 1 57 5 2 >>> 8 65 1 77 19 2 >>> 3 80 1 91 19 2 >>> 9 95 1 107 19 2 >>> 0 4 1 15 19 3 >>> 8 17 1 31 19 3 >>> 8 32 1 46 19 3 >>> . 52 2 58 8 3 >>> 1 64 0 77 20 3 >>> 8 80 1 91 19 3 >>> 5 96 1 107 19 3 >>> 0 3 1 15 19 4 >>> 8 19 1 30 19 4 >>> 7 34 1 47 19 4 >>> . 53 1 58 9 4 >>> 5 65 1 77 19 4 >>> 6 80 1 92 19 4 >>> 4 95 0 109 20 4 >>> 0 4 1 15 19 5 >>> 7 19 1 30 19 5 >>> 5 34 1 46 19 5 >>> . 53 1 57 5 5 >>> 3 65 1 76 19 5 >>> 1 82 1 90 19 5 >>> 3 96 1 107 19 5 >>> >>> >>> Unicharset: >>> 14 >>> NULL 0 Common 0 >>> Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 >>> 6e 65 64 ]a >>> |Broken|0|1 21 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken >>> 0 8 0,255,0,255,0,0,0,0,0,0 Common 3 2 3 0 # 0 [30 ]0 >>> 9 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 9 # 9 [39 ]0 >>> 3 8 0,255,0,255,0,0,0,0,0,0 Common 5 2 5 3 # 3 [33 ]0 >>> . 22 0,255,0,255,0,0,0,0,0,0 Common 6 6 6 . # . [2e ]p >>> 2 8 0,255,0,255,0,0,0,0,0,0 Common 7 2 7 2 # 2 [32 ]0 >>> 5 8 0,255,0,255,0,0,0,0,0,0 Common 8 2 8 5 # 5 [35 ]0 >>> 8 8 0,255,0,255,0,0,0,0,0,0 Common 9 2 9 8 # 8 [38 ]0 >>> 4 8 0,255,0,255,0,0,0,0,0,0 Common 10 2 10 4 # 4 [34 ]0 >>> 1 8 0,255,0,255,0,0,0,0,0,0 Common 11 2 11 1 # 1 [31 ]0 >>> 6 8 0,255,0,255,0,0,0,0,0,0 Common 12 2 12 6 # 6 [36 ]0 >>> 7 8 0,255,0,255,0,0,0,0,0,0 Common 13 2 13 7 # 7 [37 ]0 >>> >>> >>> NTS.font.exp0.tr file: >>> font 0 3 1 14 19 0 >>> 4 >>> mf 16 >>> -0.085041896 0.30783021 0.27617577 0 0 0 >>> -0.25234067 0.27376649 0.089746617 0.13718249 0 0 >>> -0.28155157 0.0045010448 0.47040343 0.25 0 0 >>> -0.25234067 -0.26476437 0.08974655 0.36281759 0 0 >>> -0.085041896 -0.29882804 0.27617577 0.5 0 0 >>> -0.031931162 -0.21447986 0.1730229 0.96998096 0 0 >>> -0.11690831 0.020721853 0.43796182 0.75 0 0 >>> -0.031931162 0.23970276 0.1699543 0.5 0 0 >>> 0.24424461 0.072628468 0.47339222 0.76789355 0 0 >>> 0.1353676 0.30783021 0.16464323 0 0 0 >>> 0.10615671 0.18941826 0.14627755 0.37934926 0 0 >>> 0.15926743 -0.011719763 0.30170703 0.25 0 0 >>> 0.10615671 -0.19663697 0.12619166 0.090763755 0 0 >>> 0.1353676 -0.29882804 0.16464323 0.5 0 0 >>> 0.27079996 -0.26476437 0.12619169 0.59076369 0 0 >>> 0.29735535 -0.19663697 0.086383387 0.85538673 0 0 >>> cn 1 >>> 0.36328125 0.35781249 0.2421875 0.1484375 >>> if 73 >>> 133 69 248 >>> 119 72 248 >>> 104 75 248 >>> 97 82 192 >>> 97 95 192 >>> 97 107 192 >>> 97 120 192 >>> 97 132 192 >>> 97 145 192 >>> 97 157 192 >>> 97 170 192 >>> 97 182 192 >>> 104 188 128 >>> 119 188 128 >>> 133 188 128 >>> 135 206 0 >>> 123 206 0 >>> 111 206 0 >>> 99 206 0 >>> 88 206 0 >>> 76 206 0 >>> 66 201 35 >>> 59 193 35 >>> 55 182 64 >>> 55 168 64 >>> 55 155 64 >>> 55 142 64 >>> 55 128 64 >>> 55 115 64 >>> 55 101 64 >>> 55 88 64 >>> 55 75 64 >>> 59 64 93 >>> 66 55 93 >>> 76 51 128 >>> 88 51 128 >>> 99 51 128 >>> 111 51 128 >>> 123 51 128 >>> 135 51 128 >>> 145 184 97 >>> 154 175 97 >>> 163 167 97 >>> 168 156 64 >>> 168 143 64 >>> 168 130 64 >>> 168 118 64 >>> 168 105 64 >>> 168 92 64 >>> 163 82 23 >>> 154 77 23 >>> 145 71 23 >>> 148 51 128 >>> 162 51 128 >>> 176 51 128 >>> 187 53 151 >>> 196 59 151 >>> 205 65 151 >>> 207 72 219 >>> 200 81 219 >>> 196 92 192 >>> 196 105 192 >>> 196 118 192 >>> 196 130 192 >>> 196 143 192 >>> 196 156 192 >>> 195 168 204 >>> 191 179 204 >>> 188 190 204 >>> 184 200 204 >>> 176 206 0 >>> 162 206 0 >>> 148 206 0 >>> tb 1 >>> 64 251 114 >>> >>> >>> pffmtable: >>> NULL 0 >>> Joined 0 >>> |Broken|0|1 0 >>> 0 0 >>> 9 0 >>> 3 0 >>> . 0 >>> 2 0 >>> 5 0 >>> 8 0 >>> 4 0 >>> 1 0 >>> 6 0 >>> 7 0 >>> >>> NTS.normproto file: >>> linear essential -0.250000 0.750000 >>> linear non-essential 0.000000 1.000000 >>> linear essential 0.000000 1.000000 >>> linear essential 0.000000 1.000000 >>> >>> 0 1 >>> significant elliptical 34 >>> 0.364775 0.371404 0.241039 0.150391 >>> 0.000400 0.000416 0.000400 0.000400 >>> >>> 9 1 >>> significant elliptical 13 >>> 0.372897 0.418750 0.241286 0.157752 >>> 0.000400 0.004734 0.000400 0.001087 >>> >>> 3 1 >>> significant elliptical 16 >>> 0.365479 0.385596 0.247070 0.143799 >>> 0.000400 0.003148 0.000400 0.000702 >>> >>> . 1 >>> significant elliptical 27 >>> 0.081019 0.055483 0.060619 0.050492 >>> 0.000400 0.000400 0.000400 0.000400 >>> >>> 2 1 >>> significant elliptical 10 >>> 0.354297 0.359492 0.248828 0.138672 >>> 0.000400 0.000400 0.000400 0.000400 >>> >>> 5 1 >>> significant elliptical 10 >>> 0.363672 0.350859 0.248047 0.144922 >>> 0.000400 0.000400 0.000400 0.000400 >>> >>> 8 1 >>> significant elliptical 19 >>> 0.365543 0.378536 0.234786 0.141653 >>> 0.000400 0.000400 0.000400 0.000400 >>> >>> 4 1 >>> significant elliptical 9 >>> 0.325521 0.274219 0.215278 0.128038 >>> 0.000400 0.000400 0.000400 0.000400 >>> >>> 1 1 >>> significant elliptical 11 >>> 0.320312 0.217259 0.248580 0.091974 >>> 0.000400 0.000400 0.000400 0.000400 >>> >>> 6 1 >>> significant elliptical 20 >>> 0.360156 0.370703 0.238281 0.143164 >>> 0.000400 0.000400 0.000400 0.000400 >>> >>> 7 1 >>> significant elliptical 20 >>> 0.448633 0.243359 0.242969 0.113477 >>> 0.000400 0.000400 0.000400 0.000400 >>> >>> -- >>> >> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/9e3a6851-0311-4148-af1f-b61999f38977n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/9e3a6851-0311-4148-af1f-b61999f38977n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/c09d4786-595e-4e49-b5c6-b7ded4bee47fn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/c09d4786-595e-4e49-b5c6-b7ded4bee47fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUTKhuLkAkyYooTqBaTnTE2rg1sNQgUuRB0Gk171WAR7g%40mail.gmail.com.