kryo doesn't support guava's collections by default I remember encountered project in github that fixes this(not sure though). I've ended to stop using guava collections as soon as spark rdds are concerned.
On 5 October 2015 at 21:04, Jakub Dubovsky <spark.dubovsky.ja...@seznam.cz> wrote: > Hi all, > > I would like to have an advice on how to use ImmutableList with RDD. Small > presentation of an essence of my problem in spark-shell with guava jar > added: > > scala> import com.google.common.collect.ImmutableList > import com.google.common.collect.ImmutableList > > scala> val arr = Array(ImmutableList.of(1,2), ImmutableList.of(2,4), > ImmutableList.of(3,6)) > arr: Array[com.google.common.collect.ImmutableList[Int]] = Array([1, 2], > [2, 4], [3, 6]) > > scala> val rdd = sc.parallelize(arr) > rdd: > org.apache.spark.rdd.RDD[com.google.common.collect.ImmutableList[Int]] = > ParallelCollectionRDD[0] at parallelize at <console>:24 > > scala> rdd.count > > This results in kryo exception saying that it cannot add a new element to > list instance while deserialization: > > java.io.IOException: java.lang.UnsupportedOperationException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163) > at > org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) > ... > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.UnsupportedOperationException > at > com.google.common.collect.ImmutableCollection.add(ImmutableCollection.java:91) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) > ... > > It somehow makes sense. But I cannot think of a workaround and I do not > believe that using ImmutableList with RDD is not possible. How this is > solved? > > Thank you in advance! > > Jakub Dubovsky > >