I get 2 types of error -
-org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
location for shuffle 0 and
FetchFailedException: Adjusted frame length exceeds 2147483647: 12716268407
- discarded
Spar keeps re-trying to submit the code and keeps getting this error.
My file on which I am finding the sliding window strings is 500 MB and I
am doing it with length = 150.
It woks fine till length is 100.
This is my code -
val hgfasta = sc.textFile(args(0)) // read the fasta file
val kCount = hgfasta.flatMap(r => { r.sliding(args(2).toInt) })
val kmerCount = kCount.map(x => (x, 1)).reduceByKey(_ + _).map { case
(x, y) => (y, x) }.sortByKey(false).map { case (i, j) => (j, i) }
val filtered = kmerCount.filter(kv => kv._2 < 5)
filtered.map(kv => kv._1 + ", " +
kv._2.toLong).saveAsTextFile(args(1))
}
It gets stuck and flat map and save as Text file Throws
-org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
location for shuffle 0 and
org.apache.spark.shuffle.FetchFailedException: Adjusted frame length
exceeds 2147483647: 12716268407 - discarded
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)