subject:"Re\: IOError on createDataFrame"

Re: IOError on createDataFrame

2015-08-31 Thread Philip

Pandas performance is definitely the issue here. You're using Pandas as an ETL system, and it's more suitable as an endpoint rather than an conduit. That is, it's great to dump your data there and do your analysis within Pandas, subject to its constraints, but if you need to "back out" and use some

Re: IOError on createDataFrame

2015-08-31 Thread fsacerdoti

There are two issues here: 1. Suppression of the true reason for failure. The spark runtime reports "TypeError" but that is not why the operation failed. 2. The low performance of loading a pandas dataframe. DISCUSSION Number (1) is easily fixed, and the primary purpose for my post. Number (2)

Re: IOError on createDataFrame

2015-08-30 Thread Akhil Das

Why not attach a bigger hard disk to the machines and point your SPARK_LOCAL_DIRS to it? Thanks Best Regards On Sat, Aug 29, 2015 at 1:13 AM, fsacerdoti wrote: > Hello, > > Similar to the thread below [1], when I tried to create an RDD from a 4GB > pandas dataframe I encountered the error > >