Re: Migrating bayes to mysql fails with parsing errors

Dave Wreski Thu, 23 Jun 2011 09:41:00 -0700

Hi,

since so many have problems i share my mysql shemas :=)
    `token` binary(5) NOT NULL,


Yes, the binary or varbinary is the key to a solution here.
Mucking with utf-8 vs latin-1 is just covering but not solving
the most glaring problem here, namely that a token must not be
associated with any character set, as it does not obey any
such rules, nor should it be treated case-insensitively
(as char is, which is possibly a reason for more than two
record changes as reported by Dave). Will take a closer look...

I changed the "Type=MyISAM" at the end of each CREATE statement in theoriginal schema and replaced it with the following from Benny's schema:


ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

It's now working, but is excruciatingly slow. Is this also just coveringthe problem, or will this be a usable solution when it finally finishes?

Is there a difference whether I learn as MyISAM then convert to InnoDBafter it finishes? I could train it using original spam/ham, but I fearit will be equally as slow and obviously a more difficult process tohand-scan for corpus again.


Thanks,
Dave

Re: Migrating bayes to mysql fails with parsing errors

Reply via email to