For anyone interested, I largely resolved the performance issues with
sa-learn training when using txrep with a little mysql server tuning.
As a reference point, training with ~6400 messages (most of which had
already been learned) took about 14 minutes for both txrep+bayes, and
about 3.5 minutes l
One thing pointing to maybe a need for reworking the training logic is
that I have txrep_track_messages at the default (1), and almost every
message in my corpus has already been trained; each run brings in only a
handful of new messages (usually 10-20, but often 0, and always < 100).
It sure seems