Alexey Kudinkin created HUDI-3639:
-------------------------------------

             Summary: [Incremental] Add Proper Incremental Records FIltering 
support into Hudi's custom RDD
                 Key: HUDI-3639
                 URL: https://issues.apache.org/jira/browse/HUDI-3639
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Alexey Kudinkin


Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on 
`ParquetFileReader` to do record-level filtering of the records that don't 
belong to a timeline span being queried.

As a side-effect, Hudi actually have to disable the use of 
[VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html]
 (since using one would prevent records from being filtered by the Reader)

 

Instead, we should make sure that proper record-level filtering is performed 
w/in the returned RDD, instead of squarely relying on FileReader to do that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to