FWIW I suspect that each count operation is an opportunity for you to trigger the bug, and each filter operation increases the likelihood of setting up the bug. I normally don't come across this error until my job has been running for an hour or two and had a chance to build up longer lineages for some RDDs. It sounds like your data is a bit smaller and it's more feasible for you to build up longer lineages more quickly.
If you can reduce your number of filter operations (for example by combining some into a single function) that may help. It may also help to introduce persistence or checkpointing at intermediate stages so that the length of the lineages that have to get replayed isn't as long. On Fri, Sep 26, 2014 at 11:10 AM, Arun Ahuja <aahuj...@gmail.com> wrote: > No for me as well it is non-deterministic. It happens in a piece of code > that does many filter and counts on a small set of records (~1k-10k). The > originally set is persisted in memory and we have a Kryo serializer set for > it. The task itself takes in just a few filtering parameters. This with > the same setting has sometimes completed to sucess and sometimes failed > during this step. > > Arun > > On Fri, Sep 26, 2014 at 1:32 PM, Brad Miller <bmill...@eecs.berkeley.edu> > wrote: > >> I've had multiple jobs crash due to "java.io.IOException: unexpected >> exception type"; I've been running the 1.1 branch for some time and am now >> running the 1.1 release binaries. Note that I only use PySpark. I haven't >> kept detailed notes or the tracebacks around since there are other problems >> that have caused my greater grief (namely "key not found" errors). >> >> For me the exception seems to occur non-deterministically, which is a bit >> interesting since the error message shows that the same stage has failed >> multiple times. Are you able to consistently re-produce the bug across >> multiple invocations at the same place? >> >> On Fri, Sep 26, 2014 at 6:11 AM, Arun Ahuja <aahuj...@gmail.com> wrote: >> >>> Has anyone else seen this erorr in task deserialization? The task is >>> processing a small amount of data and doesn't seem to have much data >>> hanging to the closure? I've only seen this with Spark 1.1 >>> >>> Job aborted due to stage failure: Task 975 in stage 8.0 failed 4 times, >>> most recent failure: Lost task 975.3 in stage 8.0 (TID 24777, host.com): >>> java.io.IOException: unexpected exception type >>> >>> java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) >>> >>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025) >>> >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) >>> >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >>> >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>> >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) >>> >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) >>> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159) >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> java.lang.Thread.run(Thread.java:744) >>> >>> >> >