clairemcginty opened a new pull request, #3098: URL: https://github.com/apache/parquet-java/pull/3098
### Rationale for this change this PR continues the work outlined in #1452. It implements a `size()` predicate for filtering on # of elements in repeated fields: ```java FilterPredicate hasThreeElements = size(intColumn("my_list_field"), Operators.Size.Operator.EQ, 3) ``` ### What changes are included in this PR? `Size()` and `not(size())` implemented for all list fields with **`required` element type**. Attempting to filter on a list of optional elements will throw an exception in the schema validator. This is because the existing record-level filtering setup (`IncrementallyUpdatedFilterPredicateEvaluator`) only feeds in non-null values to the `ValueInspectors`. thus if you had an array [1,2, null, 4] it would only count 3 elements. I can file a ticket to support this eventually but I think we'd have to rework the FilteringRecordMaterializer to be aware of repetition/definition levels. The list group itself can be `optional` or `required`. Null lists are treated as having size 0. Again, this is due to difficulty disambiguating them at the record-level filtering step. (Would love feedback on both these design decisions!!) ### Are these changes tested? Unit tests + tested a snapshot build locally with real datasets ### Are there any user-facing changes? New Operators API <!-- Please uncomment the line below and replace ${GITHUB_ISSUE_ID} with the actual Github issue id. --> Part of #1452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For additional commands, e-mail: issues-h...@parquet.apache.org