clairemcginty opened a new pull request, #3098:
URL: https://github.com/apache/parquet-java/pull/3098

   ### Rationale for this change
   
   this PR continues the work outlined in #1452. It implements a `size()` 
predicate for filtering on # of elements in repeated fields:
   
   ```java
   FilterPredicate hasThreeElements = size(intColumn("my_list_field"), 
Operators.Size.Operator.EQ, 3)
   ```
   
   ### What changes are included in this PR?
   
   `Size()` and `not(size())` implemented for all list fields with **`required` 
element type**. Attempting to filter on a list of optional elements will throw 
an exception in the schema validator. This is because the existing record-level 
filtering setup (`IncrementallyUpdatedFilterPredicateEvaluator`) only feeds in 
non-null values to the `ValueInspectors`. thus if you had an array [1,2, null, 
4] it would only count 3 elements. I can file a ticket to support this 
eventually but I think we'd have to rework  the FilteringRecordMaterializer to 
be aware of repetition/definition levels.
   
   The list group itself can be `optional` or `required`. Null lists are 
treated as having size 0. Again, this is due to difficulty disambiguating them 
at the record-level filtering step. (Would love feedback on both these design 
decisions!!)
   
   ### Are these changes tested?
   Unit tests + tested a snapshot build locally with real datasets
   
   ### Are there any user-facing changes?
   New Operators API
   
   <!-- Please uncomment the line below and replace ${GITHUB_ISSUE_ID} with the 
actual Github issue id. -->
   Part of #1452
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org
For additional commands, e-mail: issues-h...@parquet.apache.org

Reply via email to