Dear all,

I'm looking for a list (not 
https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty) 
comparing various segmenters (AI-based or otherwise) that could be used 
instead of Tesseract's built-in segmenter, and also one comparing GUIs that 
could be used for improving automatic segmentation results, i.e. for 
further training of an AI-based segmenter or for smoothing out errors in 
the results of a non-trainable one.

Here are the ones I'm currently aware of (excluding vapourware and 
abandoned/unmaintained projects):

Segmenters:
- https://github.com/lquirosd/P2PaLA (AI-based; does both, bounding boxes 
and baselines)
- https://github.com/mittagessen/kraken (AI-based; old version did bounding 
boxes, seems to be switching to baselines now, judging from the Issues)

GUIs:
- https://transkribus.eu/Transkribus/ (desktop client that seems to use 
P2PaLA on the server side; many features cloud-only, but nice, intuitive 
editing UI)
- https://github.com/mauvilsa/nw-page-editor (UI not as user-friendly; 
needs a lot of getting used-to, but seems quite powerful)
- https://github.com/mittagessen/kraken (old version produces HTML pages 
that can be edited and saved again)
- https://wiki.gnome.org/Apps/OCRFeeder (uses a homebrewn XML format, sadly 
no PageXML, etc.)

Any input would be appreciated :)

Best regards

Rainer

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6b3e8d94-2bf8-49a7-a1b7-db928b5e92a2o%40googlegroups.com.

Reply via email to