Which version of Spark are you running?

On Wed, Oct 8, 2014 at 4:18 PM, Ranga <sra...@gmail.com> wrote:

> Thanks Michael. Should the cast be done in the source RDD or while doing
> the SUM?
> To give a better picture here is the code sequence:
>
> val sourceRdd = sql("select ... from source-hive-table")
> sourceRdd.registerAsTable("sourceRDD")
> val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, c2)
>  // This query throws the exception when I collect the results
>
> I tried adding the cast to the aggRdd query above and that didn't help.
>
>
> - Ranga
>
> On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> Using SUM on a string should automatically cast the column.  Also you can
>> use CAST to change the datatype
>> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions>
>> .
>>
>> What version of Spark are you running?  This could be
>> https://issues.apache.org/jira/browse/SPARK-1994
>>
>> On Wed, Oct 8, 2014 at 3:47 PM, Ranga <sra...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I am in the process of migrating some logic in pig scripts to Spark-SQL.
>>> As part of this process, I am creating a few "Select...Group By" query and
>>> registering them as tables using the SchemaRDD.registerAsTable feature.
>>> When using such a registered table in a subsequent "Select...Group By"
>>> query, I get a "ClassCastException".
>>> java.lang.ClassCastException: java.lang.String cannot be cast to
>>> java.lang.Integer
>>>
>>> This happens when I use the "Sum" function on one of the columns. Is
>>> there anyway to specify the data type for the columns when the
>>> registerAsTable function is called? Are there other approaches that I
>>> should be looking at?
>>>
>>> Thanks for your help.
>>>
>>>
>>>
>>> - Ranga
>>>
>>
>>
>

Reply via email to