Thanks, Sean, I'll try to explain, what I'm trying to do.
The native component, that I'm talking about is the native code, that I call
using JNI.
I've wrote small test
Here, I traverse through the collection to call the native component N
(1000) times.
Then I have a result
it means, that I'm
Hi, Jörn, first of all, thanks for you intent to help.
This one external service is a native component, that is stateless and that
performs the calculation based on the data I provide. The data is in RDD.
That one component I have on each worker node and I would like to get as
much parallelism as
What if, when I traverse RDD, I need to calculate values in dataset by
calling external (blocking) service? How do you think that could be
achieved?
val values: Future[RDD[Double]] = Future sequence tasks
I've tried to create a list of Futures, but as RDD id not Traversable,
Future.sequence is no
After wasting a lot of time, I've found the problem. Despite I haven't used
hadoop/hdfs in my application, hadoop client matters. The problem was in
hadoop-client version, it was different than the version of hadoop, spark
was built for. Spark's hadoop version 1.2.1, but in my application that was