show has argument of truncate pass false so it wont truncate your results On 7 February 2016 at 11:01, SLiZn Liu <sliznmail...@gmail.com> wrote:
> Plus, I’m using *Spark 1.5.2*, with *spark-csv 1.3.0*. Also tried > HiveContext, but the result is exactly the same. > > > On Sun, Feb 7, 2016 at 3:44 PM SLiZn Liu <sliznmail...@gmail.com> wrote: > >> Hi Spark Users Group, >> >> I have a csv file to analysis with Spark, but I’m troubling with >> importing as DataFrame. >> >> Here’s the minimal reproducible example. Suppose I’m having a >> *10(rows)x2(cols)* *space-delimited csv* file, shown as below: >> >> 1446566430 2015-11-04<SP>00:00:30 >> 1446566430 2015-11-04<SP>00:00:30 >> 1446566430 2015-11-04<SP>00:00:30 >> 1446566430 2015-11-04<SP>00:00:30 >> 1446566430 2015-11-04<SP>00:00:30 >> 1446566431 2015-11-04<SP>00:00:31 >> 1446566431 2015-11-04<SP>00:00:31 >> 1446566431 2015-11-04<SP>00:00:31 >> 1446566431 2015-11-04<SP>00:00:31 >> 1446566431 2015-11-04<SP>00:00:31 >> >> the <SP> in column 2 represents sub-delimiter within that column, and >> this file is stored on HDFS, let’s say the path is hdfs:///tmp/1.csv >> >> I’m using *spark-csv* to import this file as Spark *DataFrame*: >> >> sqlContext.read.format("com.databricks.spark.csv") >> .option("header", "false") // Use first line of all files as header >> .option("inferSchema", "false") // Automatically infer data types >> .option("delimiter", " ") >> .load("hdfs:///tmp/1.csv") >> .show >> >> Oddly, the output shows only a part of each column: >> >> [image: Screenshot from 2016-02-07 15-27-51.png] >> >> and even the boundary of the table wasn’t shown correctly. I also used >> the other way to read csv file, by sc.textFile(...).map(_.split(" ")) >> and sqlContext.createDataFrame, and the result is the same. Can someone >> point me out where I did it wrong? >> >> — >> BR, >> Todd Leo >> >> >