When you say RISCV I assume you are targeting the new Chinese CPUs. Tesseract is entirely written in portable C + C++, so that gives us these assumptions / guestimates for risk and effort analysis for a porting project like yours (by the sound of it):
- I expect (assume) a Linux-based operating system and development environment is already present. If not, tackle that first. I say "Linux" as that's the mainstream "POSIX-compliant" UNIX flavor one expects to see with modern boards (cpu, memory, I/O, all the usual stuff that you'll find on advanced Dev boards) and tesseract doesn't *require* it *per sé*, but everything will go much easier. So prerequisite #1: you've got a working, modern, Linux, on your rig. - I expect a working C/C++ compiler for your hardware. Must support C++17 to be safe, preferably gnu GCC or clang/llvm, as that's what most people use. You have cmake, make, gnu autotools, perl, python, all working on your rig. If not, tackle that first. The latter two are not mandatory, but when you want to run any training of any kind, so a FULL PORT of tesseract, you'll be happy to have them (perl and python, that is). Personally, I use Microsoft's MSVC2022 and I'm the odd one out. (Though we could argue @stweil clearly uses the same Windows-based Dev environment as he's published tesseract installers for years ;-) ) Others may mention "cross compiling" and yes, that's an option, but I haven't seen anyone do or try that recently with tesseract. Again, you want to hug mainstream as much as possible here as you want to tackle *other* issues and it's encouraging to know you have a platform that looks and sounds a lot like the regular usage pattern out there - apart from your particular CPU, that is. - next: porting! As "everything" is done in portable C/C++, that job sounds "done". It is NOT. What *your* work now entails is porting the very important simd optimized code parts (matrix calculus) that's at the core of tesseract. Here you will have to produce a variant of optimized code that can co-exist next to the existing variants already included in tesseract. 'grep' for "AVX", "SSE", "FSM" (all caps) in the tesseract source code to find the spots that need your work. Tesseract comes with a non-optimized generic portable C variant of the same code, so the algorithm you need to provide fast code for is easy to read. DO NOT FORGET THE OTHER PART: the cpu detection and cpu capability/feature extract function: that one determines if tesseract will run your optimized code, so iut is to have that bit working as well. When you've found the SSE/AVX code chunks, you should dig *up* (instead of down) and you'll find the important cpu detect + dispatch calls easily. This, thus far, describes the "low hanging fruit" of a regular port of tesseract to a new cpu. Given that you probably(?) have access to RVV/SVE on your hardware, you might want to address the entire BLSTM engine codebase from that new perspective. My advice: do that as a "stage 2" in your project flow and also separate it out in your proposal as this will be a much more major, difficult task. I see lots of profiler runs and other investigative work in your future. ;-) ... That should set you up & going. Tasks: - check the listed assumptions: make those match. - tesseract GitHub fork, code inspection. The 'grep' bit. (Here I assume you understand what I mean when I only use the word: grep; otherwise this email becomes a book) - write cpu feature detector code, patch the dispatcher code, add RISCV optimized code variant next to the existing AVX/SSE variants. Integrate new code in the tesseract build scripts. TEST. Run a few images through your new tesseract. REPRODUCE (on other comparable hardware). - if you also want to TRAIN tesseract, look for Shree's work, the tesstrain GitHub repo and all the other tesseract organisation's repo's on GitHub, lurk, read, **a lot**. Personally I haven't done any tesseract training yet, because I could get around that task. I'm self-financed (paying me on my own dime), which is ultra-rare, and from what I've seen so far, training is "no fun" unless you have direct and unrestricted access to nice amounts of top of the line industrial hardware. (I am very patient, but only with some people and far fewer machines. ;-) ) Hope that helps, Cheerio, Ger On Thu, 16 Nov 2023, 09:50 yuxuan wang, <wangyuxuan1...@gmail.com> wrote: > Hello everyone! I am working on implementing a tool to assess the > complexity of CPU architecture porting. It primarily focuses on RISC-V > architecture porting. In fact, the tool may have an average estimate of > various architecture porting efforts.My focus is on the overall workload > and difficulty of transplantation in the past and future,even if a project > has already been ported.As part of my dataset, I have collected the > **tesseract** project. **I would like to gather community opinions to > support my assessment. I appreciate your help and response!** Based on > scanning tools, the porting complexity is determined to be simple, with a > small amount of code related to the CPU architecture in the project. Is > this assessment accurate?Do you often have any opinions on personnel > allocation and consumption time? I look forward to your help and response. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/4b84ce8d-a316-4850-9240-5e2a87b50eb7n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/4b84ce8d-a316-4850-9240-5e2a87b50eb7n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAFP60fqH3kfbSfDxgUZkKx6NfyG%3DN87BcV-rTionfwt5X2qdNw%40mail.gmail.com.