I filled up 2 Jira.
1) Performance when queries nested column
https://issues.apache.org/jira/browse/SPARK-16320
2) Pyspark performance
https://issues.apache.org/jira/browse/SPARK-16321
I found Jira for:
1) PPD on nested columns
https://issues.apache.org/jira/browse/SPARK-5151
2) Drop of support
The patch we use in production is for 1.5. We're porting the patch to master
(and downstream to 2.0, which is presently very similar) with the intention of
submitting a PR "soon". We'll push it here when it's ready:
https://github.com/VideoAmp/spark-public.
Regarding benchmarking, we have a sui
2016-06-29 23:22 GMT+02:00 Michael Allman :
> I'm sorry I don't have any concrete advice for you, but I hope this helps
> shed some light on the current support in Spark for projection pushdown.
>
> Michael
Michael,
Thanks for the answer. This resolves one of my questions.
Which Spark version you
Hi Maciej,
In Spark, projection pushdown is currently limited to top-level columns
(StructFields). VideoAmp has very large parquet-based tables (many billions of
records accumulated per day) with deeply nested schema (four or five levels),
and we've spent a considerable amount of time optimizin