你好,请问一下用的是哪个版本呀,方便分享一下你的chi_sim 和chi_sim_vert 的文件嘛?
在2024年3月17日星期日 UTC+8 00:41:13<j.w.p...@gmail.com> 写道: > Hello, > > I am making a transcrypt of YT wideos using tessaract. > Images I input to tessaract look like this: > [image: aftercut29.0.jpg] > > The output is mostly correct but sometimes the same character give > numerous output. > Example: > Input: > [image: aftercut3.0.jpg] > Output: 大*叔*中文 - CORRECT > > Input: > [image: aftercut10.5.jpg] > Output: 今天不是3位 大*档* - INCORRECT > > In preparation of the images I use: > > - *dilatation*, > - *cropping the area* of image containg characters > - I add *borders*. > > For dilatation I use 2x2 kernel and the border is 2px thick. > For segmentation method I am currently experimentig with *psg --7 *and *psg > -- 13*. psg --7 seems to give a bit better results. Of course the > language is : *lang='chi_sim'* > > Could you give my any advice how to improve the robustness of the output? > > Thank you in advance, > Jan > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4e681cab-de35-4976-9cab-f085ae600f11n%40googlegroups.com.