The fix will be included in Spark 1.0, but if you just want to apply the fix to 0.9.1, here's a hotfixed version of 0.9.1 that only includes PR #367: https://github.com/ankurdave/spark/tree/v0.9.1-handle-empty-partitions. You can clone and build this.
Ankur <http://www.ankurdave.com/> On Thu, May 22, 2014 at 4:53 AM, Zhicharevich, Alex <[email protected]>wrote: > Hi, > > > > I’m running a simple connected components code using GraphX (version 0.9.1) > > > > My input comes from a HDFS text file partitioned to 400 parts. When I run > the code on a single part or a small number of files (like 20) the code > runs fine. As soon as I’m trying to read more files (more than 30) I’m > getting an error and the job fails. > > From looking at the logs I see the following exception > > java.util.NoSuchElementException: End of stream > > at org.apache.spark.util.NextIterator.next(NextIterator.scala:83) > > at > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29) > > at > org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:52) > > at > org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:51) > > at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:456) > > > > From searching the web, I see it’s a known issue with GraphX > > Here : https://github.com/apache/spark/pull/367 > > And here : https://github.com/apache/spark/pull/497 > > > > Are there some stable releases that include this fix? Should I clone the > git repo and build it myself? How would you advise me to deal with this > issue > > > > Thanks, > > Alex > > > > > > >
