Hi everyone:
I am having several problems with an algorithm for MLLIB that I am
developing. It uses large broadcasted variables with many iteration and
breeze vectors as RDDs. The problem is that in some stages the spark
program freezes without notification. I have tried to reduce the use of
broadcasting and the size of the variables (from hash tables to simple
arrays of bytes), but the problem appears again in others lines.
The code is here:
https://github.com/sramirez/SparkFeatureSelection/blob/efficient-fs/src/main/scala/org/apache/spark/mllib/feature/InfoTheory.scala
There is a problem related with mine in JIRA:
https://issues.apache.org/jira/browse/SPARK-5363
It seems fixed, but it is not so clear. Despite being related with
PySpark, it also seems to reproduce in Scala.
I have tried several Spark versions: 1.2.0, 1.3.1, 1.4.0.
I would appreciate any clue or advise.
Thanks,
Sergio R.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org