Yes, my change is slightly downstream of this point in the processing
though. The code is still creating a counter for each distinct score
value, and then binning. I don't think that would cause a failure -
just might be slow. At the extremes, you might see 'fetch failure' as
a symptom of things ru
Sean's PR may be relevant to this issue
(https://github.com/apache/spark/pull/3702). As a workaround, you can
try to truncate the raw scores to 4 digits (e.g., 0.5643215 -> 0.5643)
before sending it to BinaryClassificationMetrics. This may not work
well if he score distribution is very skewed. See
Hi there,
We are using mllib 1.1.1, and doing Logistics Regression with a dataset of
about 150M rows.
The training part usually goes pretty smoothly without any retries. But
during the prediction stage and BinaryClassificationMetrics stage, I am
seeing retries with error of "fetch failure".
The p