The loss function here <https://spark.apache.org/docs/1.6.0/mllib-linear-methods.html#mjx-eqn-eqregPrimal> for logistic regression is confusing. It seems to imply that spark uses only -1 and 1 class labels. However it uses 0,1 as the very inconspicuous note quoted below (under Classification) says. We need to make this point more visible to avoid confusion.
Better yet, we should replace the loss function listed with that for 0, 1 no matter how mathematically inconvenient, since that is what is actually implemented in Spark. More problematic, the loss function (even in this "convenient" form) is actually incorrect. This is because it is missing either a summation (sigma) in the log or product (pi) outside the log, as the loss for logistic is the log likelihood. So there are multiple problems with the documentation. Please advise on steps to fix for all version documentation or if there are already some in place. "Note that, in the mathematical formulation in this guide, a binary label y is denoted as either +1 (positive) or −1 (negative), which is convenient for the formulation. *However*, the negative label is represented by 0 in spark.mllib instead of −1, to be consistent with multiclass labeling."