There are two issues here:
1. Suppression of the true reason for failure. The spark runtime reports
"TypeError" but that is not why the operation failed.
2. The low performance of loading a pandas dataframe.
DISCUSSION
Number (1) is easily fixed, and the primary purpose for my post.
Number (2)
Hello,
Similar to the thread below [1], when I tried to create an RDD from a 4GB
pandas dataframe I encountered the error
TypeError: cannot create an RDD from type:
However looking into the code shows this is raised from a generic "except
Exception:" predicate (pyspark/sql/context.py:238 in