*TL;DR:* I want to run a pre-processing step on the data from each partition
(such as parsing) and retain the parsed object on each node for future
processing calls to avoid repeated parsing.

/More detail:/

I have a server and two nodes in my cluster, and data partitioned using
hdfs.
I am trying to use spark to process the data and send back results.

The data is available as text, and I would like to first parse this text,
and then run future processing.
To do this, I call a simple:
JavaRDD.foreachPartition(Iterator<String>)(new
VoidFunction<Iterator&lt;String>>(){
        @Override
        public void call(Iterator<String> i){
                ParsedData p=new ParsedData(i);
        }
});

I would like to retain this ParsedData object on each node for future
processing calls, so as to avoid parsing all over again. So in my next call,
I'd like to do something like this:

JavaRDD.foreachPartition(Iterator<String>)(new
VoidFunction<Iterator&lt;String>>(){
        @Override
        public void call(Iterator<String> i){
                //refer to previously created ParsedData object
                p.process();
                //accumulate some results
        }
});



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Persistent-Local-Node-variables-tp8104.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to