Re: Shared memory between C++ process and Spark

Jian Feng Mon, 07 Dec 2015 13:18:13 -0800

The only way I can think of is through some kind of wrapper. For java/scala, 
use JNI. For Python, use extensions. There should not be a lot of work if you 
know these tools.

      From: Robin East <robin.e...@xense.co.uk>
 To: Annabel Melongo <melongo_anna...@yahoo.com> 
Cc: Jia <jacqueline...@gmail.com>; Dewful <dew...@gmail.com>; "user @spark" 
<u...@spark.apache.org>; "dev@spark.apache.org" <dev@spark.apache.org>
 Sent: Monday, December 7, 2015 10:57 AM
 Subject: Re: Shared memory between C++ process and Spark

Annabel
Spark works very well with data stored in HDFS but is certainly not tied to it. 
Have a look at the wide variety of connectors to things like Cassandra, HBase, 
etc.
Robin

Sent from my iPhone
On 7 Dec 2015, at 18:50, Annabel Melongo <melongo_anna...@yahoo.com> wrote:

Jia,
I'm so confused on this. The architecture of Spark is to run on top of HDFS. 
What you're requesting, reading and writing to a C++ process, is not part of 
that requirement.

    On Monday, December 7, 2015 1:42 PM, Jia <jacqueline...@gmail.com> wrote:

 Thanks, Annabel, but I may need to clarify that I have no intention to write 
and run Spark UDF in C++, I'm just wondering whether Spark can read and write 
data to a C++ process with zero copy.
Best Regards,Jia 

On Dec 7, 2015, at 12:26 PM, Annabel Melongo <melongo_anna...@yahoo.com> wrote:

My guess is that Jia wants to run C++ on top of Spark. If that's the case, I'm 
afraid this is not possible. Spark has support for Java, Python, Scala and R.
The best way to achieve this is to run your application in C++ and used the 
data created by said application to do manipulation within Spark. 

    On Monday, December 7, 2015 1:15 PM, Jia <jacqueline...@gmail.com> wrote:

 Thanks, Dewful!
My impression is that Tachyon is a very nice in-memory file system that can 
connect to multiple storages.However, because our data is also hold in memory, 
I suspect that connecting to Spark directly may be more efficient in 
performance.But definitely I need to look at Tachyon more carefully, in case it 
has a very efficient C++ binding mechanism.
Best Regards,Jia
On Dec 7, 2015, at 11:46 AM, Dewful <dew...@gmail.com> wrote:

Maybe looking into something like Tachyon would help, I see some sample c++ 
bindings, not sure how much of the current functionality they support...Hi, 
Robin, Thanks for your reply and thanks for copying my question to user mailing 
list.Yes, we have a distributed C++ application, that will store data on each 
node in the cluster, and we hope to leverage Spark to do more fancy analytics 
on those data. But we need high performance, that’s why we want shared 
memory.Suggestions will be highly appreciated!
Best Regards,Jia
On Dec 7, 2015, at 10:54 AM, Robin East <robin.e...@xense.co.uk> wrote:

-dev, +user (this is not a question about development of Spark itself so you’ll 
get more answers in the user mailing list)
First up let me say that I don’t really know how this could be done - I’m sure 
it would be possible with enough tinkering but it’s not clear what you are 
trying to achieve. Spark is a distributed processing system, it has multiple 
JVMs running on different machines that each run a small part of the overall 
processing. Unless you have some sort of idea to have multiple C++ processes 
collocated with the distributed JVMs using named memory mapped files doesn’t 
make architectural sense. 
-------------------------------------------------------------------------------Robin
 EastSpark GraphX in Action Michael Malak and Robin EastManning Publications 
Co.http://www.manning.com/books/spark-graphx-in-action

On 6 Dec 2015, at 20:43, Jia <jacqueline...@gmail.com> wrote:
Dears, for one project, I need to implement something so Spark can read data 
from a C++ process. 
To provide high performance, I really hope to implement this through shared 
memory between the C++ process and Java JVM process.
It seems it may be possible to use named memory mapped files and JNI to do 
this, but I wonder whether there is any existing efforts or more efficient 
approach to do this?
Thank you very much!

Best Regards,
Jia

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Shared memory between C++ process and Spark

Reply via email to