Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jian Feng
The only way I can think of is through some kind of wrapper. For java/scala, use JNI. For Python, use extensions. There should not be a lot of work if you know these tools.  From: Robin East To: Annabel Melongo Cc: Jia ; Dewful ; "user @spark" ; "dev@spark.apache.org" Sent: Monday,

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
I’m not sure what point you’re trying to prove and I’m not particularly interested in getting into a protracted discussion. Here is what you wrote: The architecture of Spark is to run on top of HDFS. I interpreted that as a statement implying that Spark has to run on HDFS which is definitely not

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
Hi Annabel I certainly did read your post. My point was that Spark can read from HDFS but is in no way tied to that storage layer . A very interesting use case that sounds very similar to Jia's (as mentioned by another poster) is contained in https://issues.apache.org/jira/browse/SPARK-10399. T

Re: Fastest way to build Spark from scratch

2015-12-07 Thread Jakob Odersky
make-distribution and the second code snippet both create a distribution from a clean state. They therefore require that every source file be compiled and that takes time (you can maybe tweak some settings or use a newer compiler to gain some speed). I'm inferring from your question that for your

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
Annabel Spark works very well with data stored in HDFS but is certainly not tied to it. Have a look at the wide variety of connectors to things like Cassandra, HBase, etc. Robin Sent from my iPhone > On 7 Dec 2015, at 18:50, Annabel Melongo wrote: > > Jia, > > I'm so confused on this. The

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
Thanks, Annabel, but I may need to clarify that I have no intention to write and run Spark UDF in C++, I'm just wondering whether Spark can read and write data to a C++ process with zero copy. Best Regards, Jia On Dec 7, 2015, at 12:26 PM, Annabel Melongo wrote: > My guess is that Jia want

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
Hi, Kazuaki, It’s very similar with my requirement, thanks! It seems they want to write to a C++ process with zero copy, and I want to do both read/write with zero copy. Any one knows how to obtain more information like current status of this JIRA entry? Best Regards, Jia On Dec 7, 2015, at

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
My guess is that Jia wants to run C++ on top of Spark. If that's the case, I'm afraid this is not possible. Spark has support for Java, Python, Scala and R. The best way to achieve this is to run your application in C++ and used the data created by said application to do manipulation within Spark

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Kazuaki Ishizaki
Is this JIRA entry related to what you want? https://issues.apache.org/jira/browse/SPARK-10399 Regards, Kazuaki Ishizaki From: Jia To: Dewful Cc: "user @spark" , dev@spark.apache.org, Robin East Date: 2015/12/08 03:17 Subject:Re: Shared memory between C++ process and Spa

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
Thanks, Dewful! My impression is that Tachyon is a very nice in-memory file system that can connect to multiple storages. However, because our data is also hold in memory, I suspect that connecting to Spark directly may be more efficient in performance. But definitely I need to look at Tachyon m

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Dewful
Maybe looking into something like Tachyon would help, I see some sample c++ bindings, not sure how much of the current functionality they support... Hi, Robin, Thanks for your reply and thanks for copying my question to user mailing list. Yes, we have a distributed C++ application, that will store

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
Hi, Robin, Thanks for your reply and thanks for copying my question to user mailing list. Yes, we have a distributed C++ application, that will store data on each node in the cluster, and we hope to leverage Spark to do more fancy analytics on those data. But we need high performance, that’s why

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
-dev, +user (this is not a question about development of Spark itself so you’ll get more answers in the user mailing list) First up let me say that I don’t really know how this could be done - I’m sure it would be possible with enough tinkering but it’s not clear what you are trying to achieve.

Re: How to debug Spark source using IntelliJ/ Eclipse

2015-12-07 Thread Iulian Dragoș
What errors do you see? I’m using Eclipse and things work pretty much as described (I’m using Scala 2.11 so there’s a slight difference for that, but if you’re fine using Scala 2.10 it should be good to go). One little difference: the sbt command is no longer in the sbt directory, instead run: bu

java.lang.OutOfMemoryError: Java heap space

2015-12-07 Thread Jagadeesan A.S.
Hi dev, We are testing spark performance on Spark-perf. While generating output for python_mllib-perf we are getting following issue. https://github.com/databricks/spark-perf/issues/92 Max. Heap Size (Estimated): 8.00G -- Following changes we made in spark-perf-maste

Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the new Kafka Consumer API

2015-12-07 Thread Mario Ds Briggs
sounds sane for a first cut. Since all creation methods take a KafkaParams, i was thinking along lines of maybe a temp property in there which trigger usage of new consumer. thanks Mario From: Cody Koeninger To: Mario Ds Briggs/India/IBM@IBMIN Cc: "dev@spark.apache.org" Date:

mlib compilation errors

2015-12-07 Thread wei....@kaiyuandao.com
hi, when I was compiling the mlib project in Intellij, it has the following errors. If I run mvn from command line, it works well. anyone came to the same issue? thanks

Re: query on SVD++

2015-12-07 Thread Robin East
Python bindings for GraphX are still in development - see https://issues.apache.org/jira/browse/SPARK-3789?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20GraphX%20ORDER%20BY%20updated%20DESC