Re: Spark-SQL: SchemaRDD - ClassCastException

Ranga Thu, 09 Oct 2014 10:50:11 -0700

Resolution:
After realizing that the SerDe (OpenCSV) was causing all the fields to be
defined as "String" type, I modified the Hive "load" statement to use the
default serializer. I was able to modify the CSV input file to use a
different delimiter. Although, this is a workaround, I am able to proceed
with this for now.



- Ranga

On Wed, Oct 8, 2014 at 9:18 PM, Ranga <sra...@gmail.com> wrote:

> This is a bit strange. When I print the schema for the RDD, it reflects
> the correct data type for each column. But doing any kind of mathematical
> calculation seems to result in ClassCastException. Here is a sample that
> results in the exception:
> select c1, c2
> ...
> cast (c18 as int) * cast (c21 as int)
> ...
> from table
>
> Any other pointers? Thanks for the help.
>
>
> - Ranga
>
> On Wed, Oct 8, 2014 at 5:20 PM, Ranga <sra...@gmail.com> wrote:
>
>> Sorry. Its 1.1.0.
>> After digging a bit more into this, it seems like the OpenCSV Deseralizer
>> converts all the columns to a String type. This maybe throwing the
>> execution off. Planning to create a class and map the rows to this custom
>> class. Will keep this thread updated.
>>
>> On Wed, Oct 8, 2014 at 5:11 PM, Michael Armbrust <mich...@databricks.com>
>> wrote:
>>
>>> Which version of Spark are you running?
>>>
>>> On Wed, Oct 8, 2014 at 4:18 PM, Ranga <sra...@gmail.com> wrote:
>>>
>>>> Thanks Michael. Should the cast be done in the source RDD or while
>>>> doing the SUM?
>>>> To give a better picture here is the code sequence:
>>>>
>>>> val sourceRdd = sql("select ... from source-hive-table")
>>>> sourceRdd.registerAsTable("sourceRDD")
>>>> val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1,
>>>> c2)  // This query throws the exception when I collect the results
>>>>
>>>> I tried adding the cast to the aggRdd query above and that didn't help.
>>>>
>>>>
>>>> - Ranga
>>>>
>>>> On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust <
>>>> mich...@databricks.com> wrote:
>>>>
>>>>> Using SUM on a string should automatically cast the column.  Also you
>>>>> can use CAST to change the datatype
>>>>> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions>
>>>>> .
>>>>>
>>>>> What version of Spark are you running?  This could be
>>>>> https://issues.apache.org/jira/browse/SPARK-1994
>>>>>
>>>>> On Wed, Oct 8, 2014 at 3:47 PM, Ranga <sra...@gmail.com> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I am in the process of migrating some logic in pig scripts to
>>>>>> Spark-SQL. As part of this process, I am creating a few "Select...Group 
>>>>>> By"
>>>>>> query and registering them as tables using the SchemaRDD.registerAsTable
>>>>>> feature.
>>>>>> When using such a registered table in a subsequent "Select...Group
>>>>>> By" query, I get a "ClassCastException".
>>>>>> java.lang.ClassCastException: java.lang.String cannot be cast to
>>>>>> java.lang.Integer
>>>>>>
>>>>>> This happens when I use the "Sum" function on one of the columns. Is
>>>>>> there anyway to specify the data type for the columns when the
>>>>>> registerAsTable function is called? Are there other approaches that I
>>>>>> should be looking at?
>>>>>>
>>>>>> Thanks for your help.
>>>>>>
>>>>>>
>>>>>>
>>>>>> - Ranga
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark-SQL: SchemaRDD - ClassCastException

Reply via email to