Haven't checked your info further, but note your remark:
*IMPORTANT*: I use images in same color pallete: black background white(close to gray) font, without any masks applied. Please, do NOT train with inverted imagery like that (white on black), particularly when you are working with existing models, as those have been trained to deal with book sources (black printed text on white paper) and feeding such a model with inverted images and forcing it to learn those too can only lead to total model confusion and consequently headaches and "weird, inexplicable results". Yes, when you dig deep ("rtfc") you'll find tesseract carries a bit of code to detect white-on-black inverted image inputs and invert those for you before feeding them to the core engine, but forget about that bit as it is only triggered under a set of very particular circumstances and (AFAIR) never in a training scenario. TL;DR: any and all training is best done based on black/dark text on white/light background as both training images and ocr-ing (processing) images' code flow *implicitly* assumes this type of input. (When you use tesseract for a longer while, you will discover that feeding it white-on-black works just well enough to give you the idea that this might fly, but "weird shit" keeps happening in your decoded outputs and the hassle never goes away whatever you try, until you adjust your preprocess to always pump out black-on-white, guaranteed, and that "sometimes it's plain weird!" stuff ... just goes away. There's technical explanations for this, surely, but way too many ifs and buts there for easy comprehension and a simple story.) If, for instance, you plan to train and use tesseract for screen reader / subtitle action, where often light text occurs on black backgrounds, the above statement implies that your customized process MUST *invert* all source images, both in the training and the using/decoding paths, as the tesseract core is meant to receive black text on white BG, always, for optimal results. In your case, may I suggest re-running all what you did, but with inverted source, i.e. all your training images turned into black text on top of white background? I expect this will deliver fewer "weird results" versus what you currently experience. Take care, Met vriendelijke groeten / Best regards, Ger Hobbelt -------------------------------------------------- web: http://www.hobbelt.com/ http://www.hebbut.net/ mail: g...@hobbelt.com mobile: +31-6-11 120 978 -------------------------------------------------- On Sat, 5 Apr 2025, 16:08 Mitya, <mityaholi...@gmail.com> wrote: > *Summary:* > I decided to train one source image (without any filters), but still > getting major issue, assumable with set of commands to train model or > (Highly Likely) in area where we update eng.trainedadata or interfere with > checkpoints! > Could you please take a look? > > Best Regards, > Mitya > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion visit > https://groups.google.com/d/msgid/tesseract-ocr/260866d4-8131-4b62-86a3-e9bb88d18187n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/260866d4-8131-4b62-86a3-e9bb88d18187n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAFP60fpM_9%2Bw33ZQ-8aoVE-8-AWkBptzhw%3D2jeV0cgZ6yW5YDg%40mail.gmail.com.