Hi,
I'm running a simple connected components code using GraphX (version 0.9.1)
My input comes from a HDFS text file partitioned to 400 parts. When I run the
code on a single part or a small number of files (like 20) the code runs fine.
As soon as I'm trying to read more files (more than 30) I'm getting an error
and the job fails.
>From looking at the logs I see the following exception
java.util.NoSuchElementException: End of stream
at org.apache.spark.util.NextIterator.next(NextIterator.scala:83)
at
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
at
org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:52)
at
org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:51)
at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:456)
>From searching the web, I see it's a known issue with GraphX
Here : https://github.com/apache/spark/pull/367
And here : https://github.com/apache/spark/pull/497
Are there some stable releases that include this fix? Should I clone the git
repo and build it myself? How would you advise me to deal with this issue
Thanks,
Alex