Re: Benchmark Java/Scala/Python for Apache spark

2019-03-11 Thread Reynold Xin
If you use UDFs in Python, you would want to use Pandas UDF for better performance. On Mon, Mar 11, 2019 at 7:50 PM Jonathan Winandy wrote: > Thanks, I didn't know! > > That being said, any udf use seems to affect badly code generation (and > the performance). > > > On Mon, 11 Mar 2019, 15:13 Dy

Re: Benchmark Java/Scala/Python for Apache spark

2019-03-11 Thread Jonathan Winandy
Thanks, I didn't know! That being said, any udf use seems to affect badly code generation (and the performance). On Mon, 11 Mar 2019, 15:13 Dylan Guedes, wrote: > Btw, even if you are using Python you can register your UDFs in Scala and > use them in Python. > > On Mon, Mar 11, 2019 at 6:55 AM

Re: Benchmark Java/Scala/Python for Apache spark

2019-03-11 Thread Dylan Guedes
Btw, even if you are using Python you can register your UDFs in Scala and use them in Python. On Mon, Mar 11, 2019 at 6:55 AM Jonathan Winandy wrote: > Hello Snehasish > > If you are not using UDFs, you will have very similar performance with > those languages on SQL. > > So it go down to : > *

Re: Benchmark Java/Scala/Python for Apache spark

2019-03-11 Thread Jonathan Winandy
Hello Snehasish If you are not using UDFs, you will have very similar performance with those languages on SQL. So it go down to : * if you know python, go for python. * if you are used to the JVM, and are ready for a bit of paradigm shift, go for Scala. Our team is using Scala, however we help o

Benchmark Java/Scala/Python for Apache spark

2019-03-11 Thread SNEHASISH DUTTA
Hi Is there a way to get performance benchmarks for development of application using either Java/Scala/Python Use case mostly involve SQL pipeline/data ingested from various sources including Kafka What should be the most preferred language and it would be great if the preference for language ca