Re: [EXTERNAL] RDD.pipe() for binary data

2022-07-16 Thread Andrew Melo
I'm curious about using shared memory to speed up the JVM->Python round trip. Is there any sane way to do anonymous shared memory in Java/scale? On Sat, Jul 16, 2022 at 16:10 Sebastian Piu wrote: > Other alternatives are to look at how PythonRDD does it in spark, you > could also try to go for a

Re: [EXTERNAL] RDD.pipe() for binary data

2022-07-16 Thread Sebastian Piu
Other alternatives are to look at how PythonRDD does it in spark, you could also try to go for a more traditional setup where you expose your python functions behind a local/remote service and call that from scala - say over thrift/grpc/http/local socket etc. Another option, but I've never done it

Re: [EXTERNAL] RDD.pipe() for binary data

2022-07-16 Thread Sean Owen
Use GraphFrames? On Sat, Jul 16, 2022 at 3:54 PM Yuhao Zhang wrote: > Hi Shay, > > Thanks for your reply! I would very much like to use pyspark. However, my > project depends on GraphX, which is only available in the Scala API as far > as I know. So I'm locked with Scala and trying to find a way

Re: [EXTERNAL] RDD.pipe() for binary data

2022-07-16 Thread Yuhao Zhang
Hi Shay, Thanks for your reply! I would very much like to use pyspark. However, my project depends on GraphX, which is only available in the Scala API as far as I know. So I'm locked with Scala and trying to find a way out. I wonder if there's a way to go around it. Best regards, Yuhao Zhang On

Re: [EXTERNAL] RDD.pipe() for binary data

2022-07-10 Thread Shay Elbaz
Yuhao, You can use pyspark as entrypoint to your application. With py4j you can call Java/Scala functions from the python application. There's no need to use the pipe() function for that. Shay From: Yuhao Zhang Sent: Saturday, July 9, 2022 4:13:42 AM To: use