I tried in both 1.5.0, 1.6.0 and 2.0.0 trunk and com.databricks:spark-csv_2.10:1.3.0 with expected results, where the columns seem to be read properly.
+----------+----------------------+ |C0 |C1 | +----------+----------------------+ |1446566430 | 2015-11-04<SP>00:00:30| |1446566430 | 2015-11-04<SP>00:00:30| |1446566430 | 2015-11-04<SP>00:00:30| |1446566430 | 2015-11-04<SP>00:00:30| |1446566430 | 2015-11-04<SP>00:00:30| |1446566431 | 2015-11-04<SP>00:00:31| |1446566431 | 2015-11-04<SP>00:00:31| |1446566431 | 2015-11-04<SP>00:00:31| |1446566431 | 2015-11-04<SP>00:00:31| |1446566431 | 2015-11-04<SP>00:00:31| +----------+----------------------+ On Sat, Feb 6, 2016 at 11:44 PM, SLiZn Liu <sliznmail...@gmail.com> wrote: > Hi Spark Users Group, > > I have a csv file to analysis with Spark, but I’m troubling with importing > as DataFrame. > > Here’s the minimal reproducible example. Suppose I’m having a > *10(rows)x2(cols)* *space-delimited csv* file, shown as below: > > 1446566430 2015-11-04<SP>00:00:30 > 1446566430 2015-11-04<SP>00:00:30 > 1446566430 2015-11-04<SP>00:00:30 > 1446566430 2015-11-04<SP>00:00:30 > 1446566430 2015-11-04<SP>00:00:30 > 1446566431 2015-11-04<SP>00:00:31 > 1446566431 2015-11-04<SP>00:00:31 > 1446566431 2015-11-04<SP>00:00:31 > 1446566431 2015-11-04<SP>00:00:31 > 1446566431 2015-11-04<SP>00:00:31 > > the <SP> in column 2 represents sub-delimiter within that column, and > this file is stored on HDFS, let’s say the path is hdfs:///tmp/1.csv > > I’m using *spark-csv* to import this file as Spark *DataFrame*: > > sqlContext.read.format("com.databricks.spark.csv") > .option("header", "false") // Use first line of all files as header > .option("inferSchema", "false") // Automatically infer data types > .option("delimiter", " ") > .load("hdfs:///tmp/1.csv") > .show > > Oddly, the output shows only a part of each column: > > [image: Screenshot from 2016-02-07 15-27-51.png] > > and even the boundary of the table wasn’t shown correctly. I also used the > other way to read csv file, by sc.textFile(...).map(_.split(" ")) and > sqlContext.createDataFrame, and the result is the same. Can someone point > me out where I did it wrong? > > — > BR, > Todd Leo > > -- Luciano Resende http://people.apache.org/~lresende http://twitter.com/lresende1975 http://lresende.blogspot.com/