Re: Imported CSV file content isn't identical to the original file

2016-02-14 Thread SLiZn Liu
This Error message does not appear as I upgraded to 1.6.0 . -- Cheers, Todd Leo On Tue, Feb 9, 2016 at 9:07 AM SLiZn Liu wrote: > At least works for me though, temporarily disabled Kyro serilizer until > upgrade to 1.6.0. Appreciate for your update. :) > Luciano Resende 于2016年2月9日 周二02:37写道: >

Re: Imported CSV file content isn't identical to the original file

2016-02-08 Thread SLiZn Liu
At least works for me though, temporarily disabled Kyro serilizer until upgrade to 1.6.0. Appreciate for your update. :) Luciano Resende 于2016年2月9日 周二02:37写道: > Sorry, same expected results with trunk and Kryo serializer > > On Mon, Feb 8, 2016 at 4:15 AM, SLiZn Liu wrote: > >> I’ve found the tri

Re: Imported CSV file content isn't identical to the original file

2016-02-08 Thread Luciano Resende
Sorry, same expected results with trunk and Kryo serializer On Mon, Feb 8, 2016 at 4:15 AM, SLiZn Liu wrote: > I’ve found the trigger of my issue: if I start my spark-shell or submit > by spark-submit with --conf > spark.serializer=org.apache.spark.serializer.KryoSerializer, the > DataFrame cont

Re: Imported CSV file content isn't identical to the original file

2016-02-08 Thread SLiZn Liu
I’ve found the trigger of my issue: if I start my spark-shell or submit by spark-submit with --conf spark.serializer=org.apache.spark.serializer.KryoSerializer, the DataFrame content goes wrong, as I described earlier. ​ On Mon, Feb 8, 2016 at 5:42 PM SLiZn Liu wrote: > Thanks Luciano, now it lo

Re: Imported CSV file content isn't identical to the original file

2016-02-08 Thread SLiZn Liu
Thanks Luciano, now it looks like I’m the only guy who have this issue. My options is narrowed down to upgrade my spark to 1.6.0, to see if this issue is gone. — Cheers, Todd Leo ​ On Mon, Feb 8, 2016 at 2:12 PM Luciano Resende wrote: > I tried in both 1.5.0, 1.6.0 and 2.0.0 trunk and > com.da

Re: Imported CSV file content isn't identical to the original file

2016-02-07 Thread Luciano Resende
I tried in both 1.5.0, 1.6.0 and 2.0.0 trunk and com.databricks:spark-csv_2.10:1.3.0 with expected results, where the columns seem to be read properly. +--+--+ |C0|C1| +--+--+ |1446566430 | 2015-11-0400:00:30| |14

Re: Imported CSV file content isn't identical to the original file

2016-02-07 Thread SLiZn Liu
*Update*: on local mode(spark-shell --local[2], no matter read from local file system or hdfs) , it works well. But it doesn’t solve this issue, since my data scale requires hundreds of CPU cores and hundreds GB of RAM. BTW, it’s Chinese Tradition New Year now, wish you all have a happy year and h

Re: Imported CSV file content isn't identical to the original file

2016-02-07 Thread SLiZn Liu
Hi Igor, In my case, it’s not a matter of *truncate*. As the show() function in Spark API doc reads, truncate: Whether truncate long strings. If true, strings more than 20 characters will be truncated and all cells will be aligned right… whereas the leading characters of my two columns are missi

Re: Imported CSV file content isn't identical to the original file

2016-02-07 Thread Igor Berman
show has argument of truncate pass false so it wont truncate your results On 7 February 2016 at 11:01, SLiZn Liu wrote: > Plus, I’m using *Spark 1.5.2*, with *spark-csv 1.3.0*. Also tried > HiveContext, but the result is exactly the same. > ​ > > On Sun, Feb 7, 2016 at 3:44 PM SLiZn Liu wrote:

Re: Imported CSV file content isn't identical to the original file

2016-02-07 Thread SLiZn Liu
Plus, I’m using *Spark 1.5.2*, with *spark-csv 1.3.0*. Also tried HiveContext, but the result is exactly the same. ​ On Sun, Feb 7, 2016 at 3:44 PM SLiZn Liu wrote: > Hi Spark Users Group, > > I have a csv file to analysis with Spark, but I’m troubling with importing > as DataFrame. > > Here’s t

Imported CSV file content isn't identical to the original file

2016-02-06 Thread SLiZn Liu
Hi Spark Users Group, I have a csv file to analysis with Spark, but I’m troubling with importing as DataFrame. Here’s the minimal reproducible example. Suppose I’m having a *10(rows)x2(cols)* *space-delimited csv* file, shown as below: 1446566430 2015-11-0400:00:30 1446566430 2015-11-0400:00:30