Re: specifing schema on dataframe

2017-02-04 Thread Sam Elamin
Hi Direceu Thanks your right! that did work But now im facing an even bigger problem since i dont have access to change the underlying data, I just want to apply a schema over something that was written via the sparkContext.newAPIHadoopRDD Basically I am reading in a RDD[JsonObject] and would l

Re: specifing schema on dataframe

2017-02-04 Thread Dirceu Semighini Filho
Hi Sam Remove the " from the number that it will work Em 4 de fev de 2017 11:46 AM, "Sam Elamin" escreveu: > Hi All > > I would like to specify a schema when reading from a json but when trying > to map a number to a Double it fails, I tried FloatType and IntType with no > joy! > > > When inferr

Re: Remove support for Hadoop 2.5 and earlier?

2017-02-04 Thread Steve Loughran
On 3 Feb 2017, at 21:28, Jacek Laskowski mailto:ja...@japila.pl>> wrote: Hi Sean, Given that 3.0.0 is coming, removing the unused versions would be a huge benefit from maintenance point of view. I'd support removing support for 2.5 and earlier. Speaking of Hadoop support, is anyone considering

specifing schema on dataframe

2017-02-04 Thread Sam Elamin
Hi All I would like to specify a schema when reading from a json but when trying to map a number to a Double it fails, I tried FloatType and IntType with no joy! When inferring the schema customer id is set to String, and I would like to cast it as Double so df1 is corrupted while df2 shows A

How to checkpoint and RDD after a stage and before reaching an action?

2017-02-04 Thread leo9r
Hi, I have a 1-action job (saveAsObjectFile at the end), that includes several stages. One of those stages is an expensive join "rdd1.join(rdd2)". I would like to checkpoint rdd1 right before the join to improve the stability of the job. However, what I'm seeing is that the job gets executed all t