I'm starting to develop ADMM for some models using pyspark(Spark version 1.0.0). So I constantly simulated data to test my code. I did simulation in python but then I ran into the same kind of problems as mentioned above. Same meaningless error messages show up when I tried methods like first, take or takeSample. There is no "Out of Memory" so the size should not be a problem for pyspark.
Again, this is not a problem for Scala. I also installed and tried Spark 0.9.1. The same code runs correctly in pyspark of the older version. So it is a problem only with pyspark in 1.0.0. My code for data simulation: -Congrui -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-Failed-to-run-first-tp7691p7964.html Sent from the Apache Spark User List mailing list archive at Nabble.com.