I searched on this, but didn't find anything general so I apologize if this has been addressed.
Many algorithms (SGD, SVM...) either will not converge or will run forever if the data is not scaled. Sci-kit has preprocessing <http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html> that will subtract the mean and divide by standard dev. Of course there are a few options with it as well. Is there something in the works for this? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Standard-preprocessing-scaling-tp6826.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.