Re: How to speed PySpark to match Scala/Java performance

2015-01-29 Thread Sasha Kacanski
thanks for quick reply, I will check the link. Hopefully, with conversion to py3, or 3.4 we could take advantage of asyncio and other cool new stuff ... On Thu, Jan 29, 2015 at 7:41 PM, Reynold Xin wrote: > It is something like this: > https://issues.apache.org/jira/browse/SPARK-5097 > > On the

Re: How to speed PySpark to match Scala/Java performance

2015-01-29 Thread Reynold Xin
It is something like this: https://issues.apache.org/jira/browse/SPARK-5097 On the master branch, we have a Pandas like API already. On Thu, Jan 29, 2015 at 4:31 PM, Sasha Kacanski wrote: > Hi Reynold, > In my project I want to use Python API too. > When you mention DF's are we talking about p

Re: How to speed PySpark to match Scala/Java performance

2015-01-29 Thread Sasha Kacanski
Hi Reynold, In my project I want to use Python API too. When you mention DF's are we talking about pandas or this is something internal to spark py api. If you could elaborate a bit on this or point me to alternate documentation. Thanks much --sasha On Thu, Jan 29, 2015 at 4:12 PM, Reynold Xin wr

Re: How to speed PySpark to match Scala/Java performance

2015-01-29 Thread Reynold Xin
Once the data frame API is released for 1.3, you can write your thing in Python and get the same performance. It can't express everything, but for basic things like projection, filter, join, aggregate and simple numeric computation, it should work pretty well. On Thu, Jan 29, 2015 at 12:45 PM, rt

Re: How to speed PySpark to match Scala/Java performance

2015-01-29 Thread Davies Liu
Hey, Without having Python as fast as Scala/Java, I think it's impossible to similar performance in PySpark as in Scala/Java. Jython is also much slower than Scala/Java. With Jython, we can avoid the cost of manage multiple process and RPC, we may still need to do the data conversion between Java