It turns out that using PageSegMode.SingleBlock helped.

On Wednesday, November 13, 2024 at 8:22:48 AM UTC-6 Mark Bussey wrote:

> First attempt at using tesseract ocr. I am using Tesseract-ocr v3.02 (I 
> know it's old, but didn't see the need to include neural network code) to 
> read the digits found on an image of a Sudoku puzzle.
> I am coding in C#, .Net application. I want to read an unfinished Sudoku 
> puzzle from my local newspaper e-edition, and finish it on my laptop 
> screen. Trivial, or so I thought.
>
> Here is an example input image.
> [image: SudokuBoard.jpg]
> I am grabbing each sub-square individually, and processing that to get 
> just one digit at a time.
> The code includes an offset to make the sub-square smaller, to eliminate 
> the lined borders around the digits in the image.
> Here is my code:
>
> {
>             string docPath = 
> Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
>
>             Bitmap img = new Bitmap(Path.Combine(docPath, 
> "SudokuBoard.jpg"));
>
>             int i = 1;
>             int iOffset = 10;
>
>             for (int y = 0; y < img.Height - 10; y += img.Height / 9)
>             {
>                 int j = 0;
>                 for (int x = 0; x < img.Width - 10; x += img.Width / 9)
>                 {
>                     ++j;
>
>                     TesseractEngine engine = new 
> TesseractEngine("./tessdata", "eng", EngineMode.Default);
>                     engine.SetVariable("tessedit_char_whitelist", 
> "123456789");
>
>                     Rect rectangle = new Rect(x + iOffset, y + iOffset, 
> (img.Width / 9) - (2 + iOffset), (img.Height / 9) - iOffset);
>
>                     try
>                     {
>                         Page page = engine.Process(img, rectangle, 
> PageSegMode.SingleChar);
>                         
>                         Pix pix = page.GetThresholdedImage();
>
>                         pix.Save(Path.Combine(docPath, "SudokuOcrImg_" + 
> i.ToString() + "," + j.ToString() + ".bmp"), ImageFormat.Bmp);
>
>                         string strOcrText = page.GetText();
>                         if (page.GetText().Length > 0)
>                         {
>                             if 
> (char.IsDigit(char.Parse(strOcrText.Substring(0,1))))
>                             {
>                                 
> Console.WriteLine(page.GetText().Substring(0, 1));
>                             }
>                             else
>                             {
>                                 Console.WriteLine(".");
>                             }
>                         }
>                         else
>                         {
>                            Console.WriteLine(".");
>                         }
>                     }
>                     catch (Exception e)
>                     {
>                         Console.WriteLine("exception");
>                     }
>                 }
>
>                 ++i;
>             }
>
> I process the entire image one row at a time, 9 cells in each row.
> I output a dot where the image has an empty cell.
> I'm wondering about the info dumped to the Console in between valid digits.
> I highlighted one such info dump in red.
> Here is the output to the console:
> .
> 4
> .
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> 2
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> .
> .
> .
> 1
> 6
> .
> 3
> 5
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> 7
> .
> .
> 3     <-  this is from a blank sub-square
> 9     <-  this is from a blank sub-square
> 2
> 4
> 7
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> 3
> 3     <-  this is from a blank sub-square
> .       <-  this is missing the value "6"
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> 7
> 5
> .
> .
> .
> .
> .
> .     <-  this is missing value "2"
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> .
> 9
> .
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> .
> 6
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> .
> .
> 7     <-  this is from a blank sub-square
> 1
> 4
> .
> 5
> 7     <-  this is from a blank sub-square
> 9
> .
> 6
> 3
> 4
> .
> .
> .
> .
> 1
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> 4
> 5
> .
> 3
> .     <-  this is missing the value "9"
> .
> .
> .
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> 1
> Bottom=0, top=76, base=0, x=0
>
> Total count=0
> Min=0.00 Really=0
> Lower quartile=0.00
> Median=0.00, ile(0.5)=0.00
> Upper quartile=0.00
> Max=0.00 Really=0
> Range=1
> Mean= 0.00
> SD= 0.00
> .
> .
> 7
> .
>
>
> I am including the pix images of each sub-square that was processed.
> there are 81 of them - sorry. Why do some have additional noise?
> Looks like I am not getting the center of each sub-square as I thought I 
> was.
> [image: SudokuOcrImg_1,1.bmp] [image: SudokuOcrImg_1,2.bmp] [image: 
> SudokuOcrImg_1,3.bmp] [image: SudokuOcrImg_1,4.bmp] [image: 
> SudokuOcrImg_1,5.bmp] [image: SudokuOcrImg_1,6.bmp][image: 
> SudokuOcrImg_1,7.bmp] [image: SudokuOcrImg_1,8.bmp] [image: 
> SudokuOcrImg_1,9.bmp] 
>
> [image: SudokuOcrImg_2,1.bmp] [image: SudokuOcrImg_2,2.bmp] [image: 
> SudokuOcrImg_2,3.bmp] [image: SudokuOcrImg_2,4.bmp] [image: 
> SudokuOcrImg_2,5.bmp] [image: SudokuOcrImg_2,6.bmp] [image: 
> SudokuOcrImg_2,7.bmp] [image: SudokuOcrImg_2,8.bmp] [image: 
> SudokuOcrImg_2,9.bmp] 
> [image: SudokuOcrImg_3,1.bmp] [image: SudokuOcrImg_3,2.bmp] [image: 
> SudokuOcrImg_3,3.bmp] [image: SudokuOcrImg_3,4.bmp] [image: 
> SudokuOcrImg_3,5.bmp] [image: SudokuOcrImg_3,6.bmp] [image: 
> SudokuOcrImg_3,7.bmp] [image: SudokuOcrImg_3,8.bmp] [image: 
> SudokuOcrImg_3,9.bmp] 
> [image: SudokuOcrImg_4,1.bmp] [image: SudokuOcrImg_4,2.bmp] [image: 
> SudokuOcrImg_4,3.bmp] [image: SudokuOcrImg_4,4.bmp] [image: 
> SudokuOcrImg_4,5.bmp] [image: SudokuOcrImg_4,6.bmp] [image: 
> SudokuOcrImg_4,7.bmp] [image: SudokuOcrImg_4,8.bmp] [image: 
> SudokuOcrImg_4,9.bmp] 
> [image: SudokuOcrImg_5,1.bmp] [image: SudokuOcrImg_5,2.bmp] [image: 
> SudokuOcrImg_5,3.bmp] [image: SudokuOcrImg_5,4.bmp] [image: 
> SudokuOcrImg_5,5.bmp] [image: SudokuOcrImg_5,6.bmp] [image: 
> SudokuOcrImg_5,7.bmp] [image: SudokuOcrImg_5,8.bmp] [image: 
> SudokuOcrImg_5,9.bmp] <-  this 9 doesn't show the bar at the top edge
> [image: SudokuOcrImg_6,1.bmp] [image: SudokuOcrImg_6,2.bmp] [image: 
> SudokuOcrImg_6,3.bmp] [image: SudokuOcrImg_6,4.bmp] [image: 
> SudokuOcrImg_6,5.bmp] [image: SudokuOcrImg_6,6.bmp][image: 
> SudokuOcrImg_6,7.bmp] [image: SudokuOcrImg_6,8.bmp] [image: 
> SudokuOcrImg_6,9.bmp] 
> [image: SudokuOcrImg_7,1.bmp] [image: SudokuOcrImg_7,2.bmp] [image: 
> SudokuOcrImg_7,3.bmp] [image: SudokuOcrImg_7,4.bmp] [image: 
> SudokuOcrImg_7,5.bmp] [image: SudokuOcrImg_7,6.bmp] [image: 
> SudokuOcrImg_7,7.bmp] [image: SudokuOcrImg_7,8.bmp] [image: 
> SudokuOcrImg_7,9.bmp] 
> [image: SudokuOcrImg_8,1.bmp] [image: SudokuOcrImg_8,2.bmp] [image: 
> SudokuOcrImg_8,3.bmp] [image: SudokuOcrImg_8,4.bmp] [image: 
> SudokuOcrImg_8,5.bmp] [image: SudokuOcrImg_8,6.bmp] [image: 
> SudokuOcrImg_8,7.bmp] [image: SudokuOcrImg_8,8.bmp] [image: 
> SudokuOcrImg_8,9.bmp] 
> [image: SudokuOcrImg_9,1.bmp] [image: SudokuOcrImg_9,2.bmp] [image: 
> SudokuOcrImg_9,3.bmp] [image: SudokuOcrImg_9,4.bmp] [image: 
> SudokuOcrImg_9,5.bmp] [image: SudokuOcrImg_9,6.bmp] [image: 
> SudokuOcrImg_9,7.bmp] [image: SudokuOcrImg_9,8.bmp] [image: 
> SudokuOcrImg_9,9.bmp] 
>
>
> I hope someone can lead me toward a solution for what seems like such a 
> simple problem.
>
> Thanks,
> Mark Bussey
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/c3a1f836-2768-44fa-9789-86d6cb82da7bn%40googlegroups.com.

Reply via email to