Re: [tesseract-ocr] Re: Traineddata files

2024-02-19 Thread Tom Morris
On Monday, February 19, 2024 at 1:30:37 AM UTC-5 argo...@gmail.com wrote: ... My question now is why tesseract does not take PDF. Pdf are images no ? PDF files can contain text, graphics, images, or a mix of them all. If you have PDF files that contain images, you can extract them using utiliti

Re: [tesseract-ocr] Re: Traineddata files

2024-02-18 Thread Philippe Argouarch
Thanks for answering I found the breton tesseract data. My question now is why tesseract does not take PDF. Pdf are images no ? regards Philippe Le mer. 14 févr. 2024 à 20:59, Tom Morris a écrit : > On Tuesday, February 13, 2024 at 12:51:35 AM UTC-5 argo...@gmail.com > wrote: > > What if there i

[tesseract-ocr] Re: Traineddata files

2024-02-14 Thread Tom Morris
On Tuesday, February 13, 2024 at 12:51:35 AM UTC-5 argo...@gmail.com wrote: What if there is no traineddata files for a language ? How do I start building a trained data file for the breton language ? Searching the archives / group for "training from scratch" should turn up lots of previous di