I looked at Java's mechanism for creating temporary local files. I believe
they can be created, written to and passed to other programs on the system.
I wrote a proof of concept to send some Strings out and use the local
program cat to concatenate them and write the result to a local file .
Clearly
I have been looking at Spark-Blast which calls Blast - a well known C++
program in parallel -
In my case I have tried to translate the C++ code to Java but am not
getting the same results - it is convoluted -
I have code that will call the program and read its results - the only real
issue is the p
Can you use JNI to call the c++ functionality directly from Java?
Or you wrap this into a MR step outside Spark and use Hadoop Streaming (it
allows you to use shell scripts as mapper and reducer)?
You can also write temporary files for each partition and execute the software
within a map step.
Hello,
You could try using mapPartitions function if you can send partial data
to your C++ program:
mapPartitions(func):
Similar to map, but runs separately on each partition (block) of the
RDD, so /func/ must be of type Iterator => Iterator when running
on an RDD of type T.
That way you ca
I have a problem where a critical step needs to be performed by a third
party c++ application. I can send or install this program on the worker
nodes. I can construct a function holding all the data this program needs
to process. The problem is that the program is designed to read and write
from