Re: How do you perform blocking IO in apache spark job?

2014-09-08 Thread DrKhu
Thanks, Sean, I'll try to explain, what I'm trying to do. The native component, that I'm talking about is the native code, that I call using JNI. I've wrote small test Here, I traverse through the collection to call the native component N (1000) times. Then I have a result it means, that I'm

Re: How do you perform blocking IO in apache spark job?

2014-09-08 Thread DrKhu
Hi, Jörn, first of all, thanks for you intent to help. This one external service is a native component, that is stateless and that performs the calculation based on the data I provide. The data is in RDD. That one component I have on each worker node and I would like to get as much parallelism as

How do you perform blocking IO in apache spark job?

2014-09-08 Thread DrKhu
What if, when I traverse RDD, I need to calculate values in dataset by calling external (blocking) service? How do you think that could be achieved? val values: Future[RDD[Double]] = Future sequence tasks I've tried to create a list of Futures, but as RDD id not Traversable, Future.sequence is no

Re: Standalone spark cluster. Can't submit job programmatically -> java.io.InvalidClassException

2014-09-08 Thread DrKhu
After wasting a lot of time, I've found the problem. Despite I haven't used hadoop/hdfs in my application, hadoop client matters. The problem was in hadoop-client version, it was different than the version of hadoop, spark was built for. Spark's hadoop version 1.2.1, but in my application that was