Hmm this would seem unrelated? Does it work on the same box without the 
package? Do you have more of the error stack you can share?


_____________________________
From: Raymond Xie <xie3208...@gmail.com<mailto:xie3208...@gmail.com>>
Sent: Saturday, December 31, 2016 8:09 AM
Subject: Re: How to load a big csv to dataframe in Spark 1.6
To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>
Cc: <user@spark.apache.org<mailto:user@spark.apache.org>>


Hello Felix,

I followed the instruction and ran the command:

> $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0

and I received the following error message:
java.lang.RuntimeException: java.net.ConnectException: Call From 
xie1/192.168.112.150<http://192.168.112.150> to localhost:9000 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused

any thought?



------------------------------------------------
Sincerely yours,


Raymond

On Fri, Dec 30, 2016 at 10:08 PM, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote:
Have you tried the spark-csv package?

https://spark-packages.org/package/databricks/spark-csv


________________________________
From: Raymond Xie <xie3208...@gmail.com<mailto:xie3208...@gmail.com>>
Sent: Friday, December 30, 2016 6:46:11 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: How to load a big csv to dataframe in Spark 1.6

Hello,

I see there is usually this way to load a csv to dataframe:


sqlContext = SQLContext(sc)Employee_rdd = sc.textFile("\..\Employee.csv")       
        .map(lambda line: line.split(","))Employee_df = 
Employee_rdd.toDF(['Employee_ID','Employee_name'])Employee_df.show()

However in my case my csv has 100+ fields, which means toDF() will be very 
lengthy.

Can anyone tell me a practical method to load the data?

Thank you very much.


Raymond




Reply via email to