Hi. You seem to be missing a lot of input. Please take a look at Tesstrain <https://github.com/tesseract-ocr/tesstrain>, and particularly its Makefile, so that you know what is involved in the training process. I would go over the official documentation of Tesstrain and run "make help" to see the input needed. One of the items, among many, that you have not specified is the CNN-LSTM network specs, which you can ask GPT/Claude to explain to you.
Furthermore, you can use GPT or Claude to digest the Makefile for you so that you know what binaries are invoked during different steps of the training process. Once you find the binaries involved, you can do something like "lstmtraining --help" for each binary and check for the complete list of options, some of which are not specified in the Tesstrain Makefile. Once you digest the Makefile of Tesstrain, it will become clear to you that, as messy as it may be, it is just an ugly wrapper to run various Tesseract binaries in sequence, which is similar to what you were trying to achieve. Then, you can (use GPT/Claude to) tailor the Makefile for you and even turn it into an equivalent Python script for easier modifications. This is almost certainly necessary if your training set is very large. On Monday, April 22, 2024 at 2:08:09 PM UTC-4 testc...@gmail.com wrote: > Hi, > i am trying to train a tesseract model with my own data. This is my code : > import os > > # Pfade konfigurieren > TRAIN_DATA_DIR = "./data1" > TRAIN_LISTFILE = "./trainingsliste.txt" > OUTPUT_DIR = "./output" > TRAINEDDATA = "./tesseract-4.1/tessdata/deu.traineddata" > # Prüfe notwendige Pfade > if not os.path.exists(TRAIN_DATA_DIR) or not > os.path.exists(TRAIN_LISTFILE) or not os.path.exists(TRAINEDDATA): > raise FileNotFoundError("Ein oder mehrere benötigte > Verzeichnisse/Dateien fehlen.") > > # Ausgabeverzeichnis erstellen, falls nicht vorhanden > if not os.path.exists(OUTPUT_DIR): > os.makedirs(OUTPUT_DIR) > > > # Trainingskonfiguration > MAX_ITERATIONS = 200 > os.environ['OMP_THREAD_LIMIT'] = '16' > > # Trainingsbefehl > command = f'lstmtraining --model_output {OUTPUT_DIR}/font_name > --traineddata {TRAINEDDATA} --train_listfile {TRAIN_LISTFILE} > --max_iterations {MAX_ITERATIONS}' > result = os.system(command + " > train_output.txt 2>&1") > print("Ausgeführter Befehl:", command) > > if result != 0: > with open('train_output.txt', 'r') as file: > output = file.read() > print("Fehler beim Training:", output) > raise Exception("Fehler beim Starten des Trainingsprozesses.") and > this is the error: Must specify an input layer as the first layer, not !! > Failed to create network from spec: > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a797e9fb-b3e6-41f1-bb83-f2fb445e8238n%40googlegroups.com.