Resolution: After realizing that the SerDe (OpenCSV) was causing all the fields to be defined as "String" type, I modified the Hive "load" statement to use the default serializer. I was able to modify the CSV input file to use a different delimiter. Although, this is a workaround, I am able to proceed with this for now.
- Ranga On Wed, Oct 8, 2014 at 9:18 PM, Ranga <sra...@gmail.com> wrote: > This is a bit strange. When I print the schema for the RDD, it reflects > the correct data type for each column. But doing any kind of mathematical > calculation seems to result in ClassCastException. Here is a sample that > results in the exception: > select c1, c2 > ... > cast (c18 as int) * cast (c21 as int) > ... > from table > > Any other pointers? Thanks for the help. > > > - Ranga > > On Wed, Oct 8, 2014 at 5:20 PM, Ranga <sra...@gmail.com> wrote: > >> Sorry. Its 1.1.0. >> After digging a bit more into this, it seems like the OpenCSV Deseralizer >> converts all the columns to a String type. This maybe throwing the >> execution off. Planning to create a class and map the rows to this custom >> class. Will keep this thread updated. >> >> On Wed, Oct 8, 2014 at 5:11 PM, Michael Armbrust <mich...@databricks.com> >> wrote: >> >>> Which version of Spark are you running? >>> >>> On Wed, Oct 8, 2014 at 4:18 PM, Ranga <sra...@gmail.com> wrote: >>> >>>> Thanks Michael. Should the cast be done in the source RDD or while >>>> doing the SUM? >>>> To give a better picture here is the code sequence: >>>> >>>> val sourceRdd = sql("select ... from source-hive-table") >>>> sourceRdd.registerAsTable("sourceRDD") >>>> val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, >>>> c2) // This query throws the exception when I collect the results >>>> >>>> I tried adding the cast to the aggRdd query above and that didn't help. >>>> >>>> >>>> - Ranga >>>> >>>> On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust < >>>> mich...@databricks.com> wrote: >>>> >>>>> Using SUM on a string should automatically cast the column. Also you >>>>> can use CAST to change the datatype >>>>> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions> >>>>> . >>>>> >>>>> What version of Spark are you running? This could be >>>>> https://issues.apache.org/jira/browse/SPARK-1994 >>>>> >>>>> On Wed, Oct 8, 2014 at 3:47 PM, Ranga <sra...@gmail.com> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> I am in the process of migrating some logic in pig scripts to >>>>>> Spark-SQL. As part of this process, I am creating a few "Select...Group >>>>>> By" >>>>>> query and registering them as tables using the SchemaRDD.registerAsTable >>>>>> feature. >>>>>> When using such a registered table in a subsequent "Select...Group >>>>>> By" query, I get a "ClassCastException". >>>>>> java.lang.ClassCastException: java.lang.String cannot be cast to >>>>>> java.lang.Integer >>>>>> >>>>>> This happens when I use the "Sum" function on one of the columns. Is >>>>>> there anyway to specify the data type for the columns when the >>>>>> registerAsTable function is called? Are there other approaches that I >>>>>> should be looking at? >>>>>> >>>>>> Thanks for your help. >>>>>> >>>>>> >>>>>> >>>>>> - Ranga >>>>>> >>>>> >>>>> >>>> >>> >> >