Adam, if you'd like to try these out I'd be very happy ;)

masses/bayes-testing/README in the SA svn repository
describes how we test new tokenization strategies, in order to
pick the ones that actually _work_.  (It's quite counterintuitive
at times as to what really helps.)

also, there's experimental code to use a multi-word tokenization as
part of the OSBF/Winnow plugin, but it's stalled due to a lack
of accuracy compared to the existing Bayes code.  if you're
curious, it lives here, or at least would if the bugzilla was
working --
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5686

--j.

On Tue, May 12, 2009 at 21:22, Adam Katz <antis...@khopis.com> wrote:
> Adam Katz wrote:
>>> vi'aqra pr,ofe'ssio,nal matters very much to your s.e,x
>>> be self-satisfied - use vi'aqra s<u>per act,i've
>>> vi'aqra pr<o>fessional - never forget about your s'e.x
>>> test  s p a c e d  words  t w i c e in a line
>>> this is an act--i've shown it 5 x, a record!
>
> Ignore the missing /^test  / below ... I truncated it for wrapping...
>
>>> viaqra professional matters very much to your sex
>>> be selfsatisfied  use viaqra super active
>>> viaqra professional  never forget about your sex
>>> s p a c e d  words  t w i c e in a line spaced:spaced spaced:twice
>>> this is an active shown it 5 x a record spaced:5xa
>
>

Reply via email to