Re: specifing schema on dataframe

Sam Elamin Sat, 04 Feb 2017 06:23:01 -0800

Hi Direceu

Thanks your right! that did work



But now im facing an even bigger problem since i dont have access to change
the underlying data, I just want to apply a schema over something that was
written via the sparkContext.newAPIHadoopRDD

Basically I am reading in a RDD[JsonObject] and would like to convert it
into a dataframe which I pass the schema into

Whats the best way to do this?

I doubt removing all the quotes in the JSON is the best solution is it?

Regards
Sam

On Sat, Feb 4, 2017 at 2:13 PM, Dirceu Semighini Filho <
dirceu.semigh...@gmail.com> wrote:

> Hi Sam
> Remove the " from the number that it will work
>
> Em 4 de fev de 2017 11:46 AM, "Sam Elamin" <hussam.ela...@gmail.com>
> escreveu:
>
>> Hi All
>>
>> I would like to specify a schema when reading from a json but when trying
>> to map a number to a Double it fails, I tried FloatType and IntType with no
>> joy!
>>
>>
>> When inferring the schema customer id is set to String, and I would like
>> to cast it as Double
>>
>> so df1 is corrupted while df2 shows
>>
>>
>> Also FYI I need this to be generic as I would like to apply it to any
>> json, I specified the below schema as an example of the issue I am facing
>>
>> import org.apache.spark.sql.types.{BinaryType, StringType, StructField, 
>> DoubleType,FloatType, StructType, LongType,DecimalType}
>> val testSchema = StructType(Array(StructField("customerid",DoubleType)))
>> val df1 = 
>> spark.read.schema(testSchema).json(sc.parallelize(Array("""{"customerid":"535137"}""")))
>> val df2 = 
>> spark.read.json(sc.parallelize(Array("""{"customerid":"535137"}""")))
>> df1.show(1)
>> df2.show(1)
>>
>>
>> Any help would be appreciated, I am sure I am missing something obvious
>> but for the life of me I cant tell what it is!
>>
>>
>> Kind Regards
>> Sam
>>
>

Re: specifing schema on dataframe

Reply via email to