at 12:55 PM
>> To: "shiva...@eecs.berkeley.edu"
>> Cc: Aleksander Eskilson , "dev@spark.apache.org"
>>
>> Subject: Re: SparkR DataFrame Column Casts esp. from CSV Files
>>
>> Yes, spark-csv does not infer types yet, but it is planned to be
>&g
om: Hossein Falaki
> Date: Wednesday, June 3, 2015 at 12:55 PM
> To: "shiva...@eecs.berkeley.edu"
> Cc: Aleksander Eskilson , "dev@spark.apache.org"
>
> Subject: Re: SparkR DataFrame Column Casts esp. from CSV Files
>
> Yes, spark-csv does not infer ty
eskil...@cerner.com>>,
"dev@spark.apache.org<mailto:dev@spark.apache.org>"
mailto:dev@spark.apache.org>>
Subject: Re: SparkR DataFrame Column Casts esp. from CSV Files
Yes, spark-csv does not infer types yet, but it is planned to be implemented
soon.
To work around th
Yes, spark-csv does not infer types yet, but it is planned to be implemented
soon.
To work around the current limitations (of spark-csv and SparkR), you can
specify the schema in read.df() to get your desired types from spark-csv. For
example:
myschema <- structType(structField(“id", "integer"
schema after loading a DF.
>
> Thanks,
> Alek
>
>
> From: Shivaram Venkataraman
> Reply-To: "shiva...@eecs.berkeley.edu"
> Date: Wednesday, June 3, 2015 at 12:29 PM
> To: Aleksander Eskilson
> Cc: "dev@spark.apache.org" , "hoss...@databricks.com"
g>"
mailto:dev@spark.apache.org>>,
"hoss...@databricks.com<mailto:hoss...@databricks.com>"
mailto:hoss...@databricks.com>>
Subject: Re: SparkR DataFrame Column Casts esp. from CSV Files
cc Hossein who knows more about the spark-csv options
You are right that the defa
cc Hossein who knows more about the spark-csv options
You are right that the default CSV reader options end up creating all
columns as string. I know that the JSON reader infers the schema [1] but I
don't know if the CSV reader has any options to do that. Regarding the
SparkR syntax to cast colum
It appears that casting columns remains a bit of a trick in Spark’s DataFrames.
This is an issue because tools like spark-csv will set column types to String
by default and will not attempt to infer types. Although spark-csv supports
specifying types for columns in its options, it’s not clear h