Re: Spark 2.0 Performance drop

Maciej Bryński Thu, 30 Jun 2016 00:30:07 -0700

I filled up 2 Jira.
1) Performance when queries nested column
https://issues.apache.org/jira/browse/SPARK-16320


2) Pyspark performance
https://issues.apache.org/jira/browse/SPARK-16321

I found Jira for:
1) PPD on nested columns
https://issues.apache.org/jira/browse/SPARK-5151

2) Drop of support for df.map etc. in Pyspark
https://issues.apache.org/jira/browse/SPARK-13594

2016-06-30 0:47 GMT+02:00 Michael Allman <[email protected]>:
> The patch we use in production is for 1.5. We're porting the patch to master 
> (and downstream to 2.0, which is presently very similar) with the intention 
> of submitting a PR "soon". We'll push it here when it's ready: 
> https://github.com/VideoAmp/spark-public.
>
> Regarding benchmarking, we have a suite of Spark SQL regression tests which 
> we run to check correctness and performance. I can share our findings when I 
> have them.
>
> Cheers,
>
> Michael
>
>> On Jun 29, 2016, at 2:39 PM, Maciej Bryński <[email protected]> wrote:
>>
>> 2016-06-29 23:22 GMT+02:00 Michael Allman <[email protected]>:
>>> I'm sorry I don't have any concrete advice for you, but I hope this helps 
>>> shed some light on the current support in Spark for projection pushdown.
>>>
>>> Michael
>>
>> Michael,
>> Thanks for the answer. This resolves one of my questions.
>> Which Spark version you have patched ? 1.6 ? Are you planning to
>> public this patch or just for 2.0 branch ?
>>
>> I gladly help with some benchmark in my environment.
>>
>> Regards,
>> --
>> Maciek Bryński
>



-- 
Maciek Bryński

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: Spark 2.0 Performance drop

Reply via email to