Hi everyone,
I have a JavaPairDStream<Integer, String> object and I'd like the Driver to
create a txt file (on HDFS) containing all of its elements.
At the moment, I use the /coalesce(1, true)/ method:
JavaPairDStream<Integer, String> unified = [partitioned stuff]
unified.foreachRDD(new Function<JavaPairRDD<Integer, String>, Void>() {
public Void call(JavaPairRDD<Integer, String>
arg0) throws Exception {
arg0.coalesce(1,
true).saveAsTextFile(<HDFS path>);
return null;
}
});
but this implies that a /single worker/ is taking all the data and writing
to HDFS, and that could be a major bottleneck.
How could I replace the worker with the Driver? I read that /collect()/
might do this, but I haven't the slightest idea on how to implement it.
Can anybody help me?
Thanks in advance.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-collect-in-Spark-Streaming-tp24659.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]