Re: How to load a big csv to dataframe in Spark 1.6

Raymond Xie Sat, 31 Dec 2016 08:10:29 -0800

Hello Felix,

I followed the instruction and ran the command:


> $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0

and I received the following error message:
java.lang.RuntimeException: java.net.ConnectException: Call From xie1/
192.168.112.150 to localhost:9000 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused

any thought?



*------------------------------------------------*
*Sincerely yours,*


*Raymond*

On Fri, Dec 30, 2016 at 10:08 PM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Have you tried the spark-csv package?
>
> https://spark-packages.org/package/databricks/spark-csv
>
>
> ------------------------------
> *From:* Raymond Xie <xie3208...@gmail.com>
> *Sent:* Friday, December 30, 2016 6:46:11 PM
> *To:* user@spark.apache.org
> *Subject:* How to load a big csv to dataframe in Spark 1.6
>
> Hello,
>
> I see there is usually this way to load a csv to dataframe:
>
> sqlContext = SQLContext(sc)
>
> Employee_rdd = sc.textFile("\..\Employee.csv")
>                .map(lambda line: line.split(","))
>
> Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])
>
> Employee_df.show()
>
> However in my case my csv has 100+ fields, which means toDF() will be very
> lengthy.
>
> Can anyone tell me a practical method to load the data?
>
> Thank you very much.
>
>
> *Raymond*
>
>

Re: How to load a big csv to dataframe in Spark 1.6

Reply via email to