I tested this in master (1.5 release), it worked as expected (changed
spark.driver.maxResultSize to 10m),
>>> len(sc.range(10).map(lambda i: '*' * (1<<23) ).take(1))
1
>>> len(sc.range(10).map(lambda i: '*' * (1<<24) ).take(1))
15/08/10 10:45:55 ERROR TaskSetManager: Total size of serialized
resul
Hi all,
I am getting some strange behavior with the RDD take function in PySpark
while doing some interactive coding in an IPython notebook. I am running
PySpark on Spark 1.2.0 in yarn-client mode if that is relevant.
I am using sc.wholeTextFiles and pandas to load a collection of .csv files
int