I just used random numbers. (My ML lib was spark-mllib_2.10-1.2.1) Please see the attached log. In the middle of the log, I dumped the data set before feeding into LogisticRegressionWithLBFGS. The first column false/true was the label (attribute “a”), and columns 2-5 (attributes “x”, “y”, “z”, and “i”) were the features. The 6th column was just row ID and was not used. The relationship was arbitrarily: a = (0.3 * x + 0.5 * y - 0.2 *z > 0.4) After that you can find LBFGS was doing its job and then pumped out the error messages. The model showed coefficients: 396.57624765427323, x 662.7969020937115, y -259.0975519038385, z 12.352037503257826, i -538.8516249699426, @a The last one was the intercept. As you can see, the model seemed close enough. After that I fed the same data back to the model to see how the predictions worked. (here attribute “a” was the prediction and “aa” was the original label) I only displayed 20 rows. The error rate showed 2 errors out of 1000. count(INTEGER), errorRate(DOUBLE), countDiff(INTEGER) key=[], rows=1 1000, 0.0020000000949949026, 2 So, the algorithm worked, just spitting out the errors was kind of annoying. If this is not result affecting, maybe it should be warning or info. C.J. |
bleep.log
Description: Binary data
On Mar 15, 2015, at 12:42 AM, DB Tsai <[email protected]> wrote: In LBFGS version of logistic regression, the data is properly |
