Good collection of segmentation algorithms. Dan Bloomberg has update the segmentation algorithms in leptonica some time back. You may want to take a look at those too.
Tesseract also uses leptonica, but older algorithms, I think. On Sat, Jul 11, 2020 at 9:19 PM Rainer Verteidiger < materialdefender2...@gmail.com> wrote: > Dear all, > > I'm looking for a list (not > https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty) > comparing various segmenters (AI-based or otherwise) that could be used > instead of Tesseract's built-in segmenter, and also one comparing GUIs that > could be used for improving automatic segmentation results, i.e. for > further training of an AI-based segmenter or for smoothing out errors in > the results of a non-trainable one. > > Here are the ones I'm currently aware of (excluding vapourware and > abandoned/unmaintained projects): > > Segmenters: > - https://github.com/lquirosd/P2PaLA (AI-based; does both, bounding boxes > and baselines) > - https://github.com/mittagessen/kraken (AI-based; old version did > bounding boxes, seems to be switching to baselines now, judging from the > Issues) > > GUIs: > - https://transkribus.eu/Transkribus/ (desktop client that seems to use > P2PaLA on the server side; many features cloud-only, but nice, intuitive > editing UI) > - https://github.com/mauvilsa/nw-page-editor (UI not as user-friendly; > needs a lot of getting used-to, but seems quite powerful) > - https://github.com/mittagessen/kraken (old version produces HTML pages > that can be edited and saved again) > - https://wiki.gnome.org/Apps/OCRFeeder (uses a homebrewn XML format, > sadly no PageXML, etc.) > > Any input would be appreciated :) > > Best regards > > Rainer > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/6b3e8d94-2bf8-49a7-a1b7-db928b5e92a2o%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/6b3e8d94-2bf8-49a7-a1b7-db928b5e92a2o%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXN4LV5Q1wEFVqV2R%2BfERnF5r0pdKUr5-E_JTS-%3DR48tg%40mail.gmail.com.