Dňa 13.08.2010 23:19, Robert Komar wrote / napísal(a): > On Fri, 13 Aug 2010, zdenko podobny wrote: > >> Because IFAIK nobody react on Catalin e-mail I offered him to create >> project >> to collect patches and possibly to solve known issues. Because of my low >> time resource project is looking still for owner/contributors. Warmly >> welcomed are expect for python (multi-platform) GUI (GTK/QT/wx...) >> because performance issues - on Windows XP (2GB memory) script crash or >> freezes during opening file with a lot of boxes/symbols (e.g. >> eng.arial.g4.tif), on Mandrivalinux 2010.164 bit (6GB memory) it take to >> open&display 15 minutes! > > 15 minutes! You need to do some profiling on your code to see > where it's spending all its time. > > http://docs.python.org/library/profile.html > I did not identify problem in "algorithm" part of code for moment ;-). I see problem in "display" (pyGTK) part of code. Script creates gtk.entry for each box and pack it to hbox container. So in case of eng.arial.g4.box file it creates 4968 ui elements for boxes + number of gtk.labels for spaces between words/group of symbols. I am not sure if there is any ui that can handle such amount of elements in reasonable time with reasonable resources. That why project needs some expert in GUI to suggest more efficient approach.
I also wonder if there is not issue on (my ;-) ) linux. When I try to open A5 image scan with 1627 boxes on Windows it is displayed with in few seconds... But on linux it took 1 min 45 sec... But this is more for discussion on http://groups.google.com/group/pytesseracttrainer-users ;-) Just take this as warning if you are end-users or challenge if you are programmer :-) Zd. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-...@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.