I have a dataset comprised of ~200k labeled points whose features are
SparseVectors with ~20M features. I take 5% of the data for a training set.
> model = LogisticRegressionWithSGD.train(training_set)
fails with
ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
File
"/cluster/home/roskarr/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 472, in send_command
self.socket.sendall(command.encode('utf-8'))
File "/cluster/home/roskarr/miniconda/lib/python2.7/socket.py", line 224,
in meth
return getattr(self._sock,name)(*args)
error: [Errno 32] Broken pipe
I'm at a loss as to where to begin to debug this... any suggestions? Thanks,
Rok
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]