Hi,
I'm think I may have encountered some kind of bug that at the moment prevents
the correct running of my application on a EC2 Cluster.
I'm saying that because the same exact code works wonderfully locally but has a
really strange behaviour on the cluster.
val uri = ssc.textFileStream(args(1) + "/inputData/newData/")
uri.print() // prints perfectly the data
uri.saveAsTextFiles((args(1) + "/uri/textFiles/"), "") // saves the file as
intended
val downloaded = uri.map(s => download(s))
.flatMap(t => t).map(t => createKey(t))
val downloadedAndFiltered = downloaded.filter(t => filterEcho(t))
downloadedAndFilteredAndEchoFiltered(t => t._2).saveAsTextFiles((args(1) +
"/dissected/textFiles/"), "")
// saves an empty file(why???)
downloadedAndFilteredAndEchoFiltered.print() // prints perfectly
// from now all data gets lost, any further call on
downloadedAndFilteredAndEchoFiltered DStream receive empty input
I have no idea of what it could be that breaks down my application, someone
knows if there are some known bug?
Gianluca