Re: Class cast exception while using Data Frames

Nikhil Goyal Tue, 27 Mar 2018 10:31:21 -0700

You can run this on spark shell

*CODE:*


        case class InstanceData(service: String, metric: String, zone:
String, source: String, time: Long, value: Double )

        val seq = sc.parallelize(Seq(
          InstanceData("serviceA", "metricA", "zoneA", "sourceA", 1000L,
1.0),
          InstanceData("serviceA", "metricA", "zoneA", "hostA", 1000L, 1.0),
          InstanceData("serviceD", "metricA", "zoneB", "hostB", 1000L, 2.0),
          InstanceData("serviceA", "metricF", "zoneA", "hostB", 1000L, 1.0)
        ))

        val instData =  spark.createDataFrame(seq)

        def makeMap = udf((service: String, metric: String, value: Double)
=> Map((service, metric) -> value))

        val instDF = instData.withColumn("metricMap", makeMap($"service",
$"metric", $"value"))

        def avgMapValueUDF = udf((newMap: Map[(String, String), Double],
count: Long) => {
          newMap.keys
            .map { keyTuple =>
              val sum = newMap.getOrElse(keyTuple, 0.0)
              (keyTuple, sum / count.toDouble)
            }.toMap
        })

        instDF.withColumn("customMap", avgMapValueUDF(col("metricMap"),
lit(1))).show



On Mon, Mar 26, 2018 at 11:51 PM, Shmuel Blitz <shmuel.bl...@similarweb.com>
wrote:

> Hi Nikhil,
>
> Can you please put a code snippet that reproduces the issue?
>
> Shmuel
>
> On Tue, Mar 27, 2018 at 12:55 AM, Nikhil Goyal <nownik...@gmail.com>
> wrote:
>
>>  |-- myMap: map (nullable = true)
>>  |    |-- key: struct
>>  |    |-- value: double (valueContainsNull = true)
>>  |    |    |-- _1: string (nullable = true)
>>  |    |    |-- _2: string (nullable = true)
>>  |-- count: long (nullable = true)
>>
>> On Mon, Mar 26, 2018 at 1:41 PM, Gauthier Feuillen <gauth...@dataroots.io
>> > wrote:
>>
>>> Can you give the output of “printSchema” ?
>>>
>>>
>>> On 26 Mar 2018, at 22:39, Nikhil Goyal <nownik...@gmail.com> wrote:
>>>
>>> Hi guys,
>>>
>>> I have a Map[(String, String), Double] as one of my columns. Using
>>>
>>> input.getAs[Map[(String, String), Double]](0)
>>>
>>>  throws exception: Caused by: java.lang.ClassCastException: 
>>> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be 
>>> cast to scala.Tuple2
>>>
>>> Even the schema says that key is of type struct of (string, string).
>>>
>>> Any idea why this is happening?
>>>
>>>
>>> Thanks
>>>
>>> Nikhil
>>>
>>>
>>>
>>
>
>
> --
> Shmuel Blitz
> Big Data Developer
> Email: shmuel.bl...@similarweb.com
> www.similarweb.com
> <https://www.facebook.com/SimilarWeb/>
> <https://www.linkedin.com/company/429838/>
> <https://twitter.com/similarweb>
>

Re: Class cast exception while using Data Frames

Reply via email to