Which version of Spark are you running? On Wed, Oct 8, 2014 at 4:18 PM, Ranga <sra...@gmail.com> wrote:
> Thanks Michael. Should the cast be done in the source RDD or while doing > the SUM? > To give a better picture here is the code sequence: > > val sourceRdd = sql("select ... from source-hive-table") > sourceRdd.registerAsTable("sourceRDD") > val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, c2) > // This query throws the exception when I collect the results > > I tried adding the cast to the aggRdd query above and that didn't help. > > > - Ranga > > On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Using SUM on a string should automatically cast the column. Also you can >> use CAST to change the datatype >> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions> >> . >> >> What version of Spark are you running? This could be >> https://issues.apache.org/jira/browse/SPARK-1994 >> >> On Wed, Oct 8, 2014 at 3:47 PM, Ranga <sra...@gmail.com> wrote: >> >>> Hi >>> >>> I am in the process of migrating some logic in pig scripts to Spark-SQL. >>> As part of this process, I am creating a few "Select...Group By" query and >>> registering them as tables using the SchemaRDD.registerAsTable feature. >>> When using such a registered table in a subsequent "Select...Group By" >>> query, I get a "ClassCastException". >>> java.lang.ClassCastException: java.lang.String cannot be cast to >>> java.lang.Integer >>> >>> This happens when I use the "Sum" function on one of the columns. Is >>> there anyway to specify the data type for the columns when the >>> registerAsTable function is called? Are there other approaches that I >>> should be looking at? >>> >>> Thanks for your help. >>> >>> >>> >>> - Ranga >>> >> >> >