Hmm this would seem unrelated? Does it work on the same box without the package? Do you have more of the error stack you can share?
_____________________________ From: Raymond Xie <xie3208...@gmail.com<mailto:xie3208...@gmail.com>> Sent: Saturday, December 31, 2016 8:09 AM Subject: Re: How to load a big csv to dataframe in Spark 1.6 To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> Cc: <user@spark.apache.org<mailto:user@spark.apache.org>> Hello Felix, I followed the instruction and ran the command: > $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 and I received the following error message: java.lang.RuntimeException: java.net.ConnectException: Call From xie1/192.168.112.150<http://192.168.112.150> to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused any thought? ------------------------------------------------ Sincerely yours, Raymond On Fri, Dec 30, 2016 at 10:08 PM, Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote: Have you tried the spark-csv package? https://spark-packages.org/package/databricks/spark-csv ________________________________ From: Raymond Xie <xie3208...@gmail.com<mailto:xie3208...@gmail.com>> Sent: Friday, December 30, 2016 6:46:11 PM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: How to load a big csv to dataframe in Spark 1.6 Hello, I see there is usually this way to load a csv to dataframe: sqlContext = SQLContext(sc)Employee_rdd = sc.textFile("\..\Employee.csv") .map(lambda line: line.split(","))Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])Employee_df.show() However in my case my csv has 100+ fields, which means toDF() will be very lengthy. Can anyone tell me a practical method to load the data? Thank you very much. Raymond