Dealing with a font which is so heavily kerned (the technical term for the 
intercharacter spacing) is going to be difficult, I suspect.

One possibility might be to train the 74 combo as effectively a ligature 
and recognize it as single symbol, but I have no idea if a) you can invest 
this level of effort or b) whether it'll work.

Tom

p.s. There are also a number of kerning related parameters, but I've never 
played with them:

$ tesseract --print-parameters | grep -E "kern|kn"
tosp_redo_kern_limit 10 No.samples reqd to reestimate for row
tosp_old_to_constrain_sp_kn 0 Constrain relative values of inter and 
intra-word gaps for old_to_method.
tosp_only_small_gaps_for_kern 0 Better guess
tosp_fuzzy_limit_all 1 Don't restrict kn->sp fuzzy limit to tables
tosp_rule_9_test_punct 0 Don't chng kn to space next to punct
tosp_flip_fuzz_kn_to_sp 1 Default flip
tosp_flip_fuzz_sp_to_kn 1 Default flip
tosp_old_sp_kn_th_factor 2 Factor for defining space threshold in terms of 
space and kern sizes
tosp_threshold_bias1 0 how far between kern and space?
tosp_threshold_bias2 0 how far between kern and space?
tosp_gap_factor 0.83 gap ratio to flip sp->kern
tosp_kern_gap_factor1 2 gap ratio to flip kern->sp
tosp_kern_gap_factor2 1.3 gap ratio to flip kern->sp
tosp_kern_gap_factor3 2.5 gap ratio to flip kern->sp
tosp_enough_small_gaps 0.65 Fract of kerns reqd for isolated row stats
tosp_table_kn_sp_ratio 2.25 Min difference of kn & sp in table
tosp_table_fuzzy_kn_sp_ratio 3 Fuzzy if less than this
tosp_fuzzy_kn_fraction 0.5 New fuzzy kn alg
tosp_min_sane_kn_sp 1.5 Don't trust spaces less than this time kn
tosp_init_guess_kn_mult 2.2 Thresh guess - mult kn by this
tosp_max_sane_kn_thresh 5 Multiplier on kn to limit thresh
tosp_flip_caution 0 Don't autoflip kn to sp when large separation
tosp_large_kerning 0.19 Limit use of xht gap with large kns
tosp_dont_fool_with_small_kerns -1 Limit use of xht gap with odd small kns
tosp_silly_kn_sp_gap 0.2 Don't let sp minus kn get too small

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f8d6441b-73e4-400f-9bfc-885ea8396b66n%40googlegroups.com.

Reply via email to