Spark 3 - Predicate/Projection Pushdown Feature

Pınar Ersoy Tue, 06 Oct 2020 05:03:16 -0700

Hello,

I am working as a Data Scientist and using Spark with Python while
developing my models.


Our data model has multiple nested structures (*Ex.*
attributes.product.feature.name). For the first two-level, I can read them
as the following with PySpark:

*data.where(col('attributes.product') != "")*

However, I cannot read the three-level nested structure:

*data.where(col('attributes.product.feature') != "")*

I was hoping to overcome this problem with Spark 3 with the advancements of
Predicate & Projection Pushdown; however, after I tested them I still
cannot read the third-level *without flattening the data*.

I would like to ask whether there will be an improvement in reading nested
data (JSON/Parquet) that has *more than two levels *in the upcoming
versions of Spark 3.

Or if I am missing something in the existing Spark versions released,
please let me know how to proceed.

Kindest Regards,
Pınar Ersoy

Spark 3 - Predicate/Projection Pushdown Feature

Reply via email to