Dear all, I'm looking for a list (not https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty) comparing various segmenters (AI-based or otherwise) that could be used instead of Tesseract's built-in segmenter, and also one comparing GUIs that could be used for improving automatic segmentation results, i.e. for further training of an AI-based segmenter or for smoothing out errors in the results of a non-trainable one.
Here are the ones I'm currently aware of (excluding vapourware and abandoned/unmaintained projects): Segmenters: - https://github.com/lquirosd/P2PaLA (AI-based; does both, bounding boxes and baselines) - https://github.com/mittagessen/kraken (AI-based; old version did bounding boxes, seems to be switching to baselines now, judging from the Issues) GUIs: - https://transkribus.eu/Transkribus/ (desktop client that seems to use P2PaLA on the server side; many features cloud-only, but nice, intuitive editing UI) - https://github.com/mauvilsa/nw-page-editor (UI not as user-friendly; needs a lot of getting used-to, but seems quite powerful) - https://github.com/mittagessen/kraken (old version produces HTML pages that can be edited and saved again) - https://wiki.gnome.org/Apps/OCRFeeder (uses a homebrewn XML format, sadly no PageXML, etc.) Any input would be appreciated :) Best regards Rainer -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6b3e8d94-2bf8-49a7-a1b7-db928b5e92a2o%40googlegroups.com.