1) Am I right in thinking that i can run sa-learn spam on a folder which contains spam, of which most has spassassin headers indicating the same and that sa-learn knows to disregard the (spam-assasin) headers or all headers for that matter...
SA's bayes subsystem tracks what message ID's it's learned from already and what they were learned as. It will not re-learn the same message unless you tell SA to change what it was learned as.
SA can (and does) learn useful information from mail already tagged as spam, so feeding tagged mail to sa-learn is good, not redundant. It will only ignore those it already learned or autolearned.
sa-learn will automatically ignore headers generated by SA itself. You can specify a bayes_ignore_header in your local.cf to make it ignore headers added by other tools.
2) how will the baysian checking affect the load as I have tweaked it so that currently my servers are hitting 0-5% idle during peak and anything more will probably make them fall over
bayes adds quite a bit of load, but if you're using some insanely large rulesets (ie: anything over 256kb) it's insignificant by comparison.
3) how will the baysian affect the need for some of the rulesets i have, no strike that
3b) how does the baysian affect any rulesets from say exit0/rulesemporium can any be done awaywith are any made practicaly obsolete by a well trained baysian???
Theoreticaly any and all rules can be obsoleted by a well trained bayes DB. The other rules exist to balance out the amount of work needed to get good results. You can get great results from a bayes-only system, but you've got to train it heavily and constantly.
SA's rules pick up the slack if you're not training 200 spams and 200 hams a day every day.
4) Anything else i should be looking into???
Hardware upgrades so you can run some more CPU intensive stuff? :)